Master's Thesis Defense: Conformal Prediction for Spatial Count Data under Dependence: A Conclique-Based Calibration Approac

13524
""

Master's Thesis Defense: Conformal Prediction for Spatial Count Data under Dependence: A Conclique-Based Calibration Approac

Lingyuan Zhao, Master's Student in Statistics & Data Science at Washington University in St. Louis

Constructing valid prediction intervals for spatial count data poses a challenge due to overdispersion and spatial dependence, both of which violate the exchangeability assumption of classical conformal inference. In this paper, we examine when conformal prediction can still provide valid results under spatial dependence. 

With some general conditions like spatial mixing and stationarity, we show that conformal prediction remains asymptotically valid under weak spatial dependence. This result provides a theoretical justification for applying conformal inference beyond the exchangeable setting. We further examine the Markov random field setting as a structured special case, then based on conditional exchangeability within concliques, we show finite-sample validity in an oracle case. 

Motivated by these results, we propose a conclique-based calibration method. The idea is to use the graph structure of spatial data to reduce local dependence in the calibration set. This helps to make conformal inference more stable in small samples, while remaining consistent with the general asymptotic theory. To implement this method with spatial count data, we combine a splitting strategy with a spatially informed, global negative binomial model that accommodates overdispersion and high-dimensional covariates. 

The simulation studies demonstrate the effect of spatial dependence on conformal calibration. The results suggest that conformal prediction remains approximately valid under weak dependence. Conclique-based splitting has better performance in small samples. The difference becomes smaller as the sample size increases. 

We also applied our method to U.S. county-level crime data, which is a realistic setting with overdispersion, spatial heterogeneity, and high-dimensional predictors. Our results show that the proposed framework remains stable in practice and produces reasonable prediction intervals under complex dependence structures. The empirical behavior is particularly consistent with the simulation findings, showing that differences between splitting strategies are more pronounced in smaller samples and diminish as the effective sample size increases.

Thesis Advisor: Robert Lunde and Debjoy Thakur