PhD Oral Thesis Defense: Assumption-lean Inference for Network-Linked Data
This dissertation studies statistical inference for network-linked data, with a focus on how network structure affects estimation and uncertainty quantification. The first part considers regression problems in which node-level covariates are derived from an observed network, such as local subgraph frequencies and spectral embeddings. It develops an assumption-lean framework for linear regression with jointly exchangeable regression arrays, where inferential targets remain well defined under model misspecification. Within this framework, the dissertation establishes asymptotic normality and bootstrap consistency, identifies a distinctive bias arising from noisy network covariates, and develops bias-corrected and resampling-based procedures for valid inference under challenging sparsity regimes.
The second part studies a network spatial autoregression model in which dependence is built directly into the data-generating process through the observed graph. In this setting, the dissertation establishes a central limit theorem under weak conditions by analyzing the dependence structure through powers of network operators and walk-based representations. It also highlights several fundamental challenges for inference when dependence is intrinsic to the response process itself.
Together, these results provide a unified perspective on inference for network-linked data, spanning both regression with network-derived covariates and models with direct network dependence.
Thesis Advisor: Robert Lunde