Graduate Student Seminar Series Presents: Local Variable Selection for High-Dimensional Spatial Data
In many spatial studies, a challenging task is selecting relevant variables incorporating geospatial dynamicity because covariates that are influential at one location may be irrelevant elsewhere. Traditional methods for choosing important variables, like LASSO, adaptive LASSO, SCAD, or grouped LASSO, do not solve this problem because they assume that the active set of selected relevant variables is the same for all locations. In this talk, I have developed a unified framework for local variable selection in high-dimensional spatial data, designed to recover location-specific sparsity while borrowing strength across nearby locations. The key idea is to construct local penalized estimators that encourage nearby locations to share similar sets of non-zero covariates without enforcing global inclusion or exclusion. In order to efficiently capture spatially varying signals, I adopt the representation of coefficient functions in the form of multi-resolution wavelet expansions, in which any non-zero spatial effect is detected through at least one active coefficient at some resolution. This leads to a flexible local LASSO framework, which I further extend to nonconvex penalties such as SCAD to reduce estimation bias and improve variable selection accuracy. I establish variable selection consistency for the proposed local methods under increasing-domain asymptotics in a fixed spatial design in the presence of a diverging number of predictors. The framework can be used widely for spatial regression models and estimating intensity in spatial point processes, as well as for a local method that uses penalized likelihood for areal data. I illustrate the practical advantages of the proposed methodology using real-world spatial data, including U.S. election outcomes, urban crime and accident occurrences in St. Louis, and county-level firearm fatality or gun-violence data. These examples show how choosing local variables helps to uncover clear spatial patterns that highlight where certain factors are important and how their effects change in different areas, while doing better than global penalized methods.