Assumption-lean Inference for Network-linked Data
We consider statistical inference for network-linked regression problems, where covariates may include network summary statistics computed for each node. In settings involving network data, it is often natural to posit that latent variables govern connection probabilities in the graph. Since the presence of these latent features makes classical regression assumptions even less tenable, we propose an assumption-lean framework for linear regression with network-linked data.
We consider two different projection parameters as potential inferential targets and establish conditions under which asymptotic Normality and bootstrap consistency hold when commonly used network statistics—such as local subgraph frequencies and random dot product graph embeddings—are used as covariates. In the case of linear regression with local count statistics, we show that a bias-corrected estimator allows one to target a more natural inferential parameter under weaker sparsity conditions compared to the OLS estimator. Our inferential tools are illustrated using both simulated data and real data related to the academic climate of elementary schools.
Advisor: Robert Lunde