Statistics and Data Science Seminar: Turning the data-integration dial: efficient inference from different data sources

Speaker: Emily Hector, North Carolina State University

Co-sponsored by TRIADS

Abstract: A fundamental aspect of statistics is the integration of data from different sources. Classically, Fisher and others were focused on how to integrate homogeneous sets of data. More recently, the question of if data sets from different sources should be integrated is becoming more relevant. The current literature treats this as a yes/no question: integrate or don't. Here we take a different approach, motivated by information-sharing principles coming from the shrinkage estimation literature. In particular, we deviate from the binary, yes/no perspective and propose a dial parameter that controls the extent to which two data sources are integrated. How far this dial parameter should be turned is shown to depend on the informativeness of the different data sources as measured by Fisher information. This more-nuanced data integration framework leads to relatively simple parameter estimates and valid tests/confidence intervals. We demonstrate both theoretically and empirically that setting the dial parameter according to our recommendation leads to more efficient estimation compared to other binary data integration schemes.

Bio: Emily is an Assistant Professor of Statistics at North Carolina State University. She obtained her PhD in Biostatistics from the University of Michigan in 2020. Her research focuses on developing computationally and statistically efficient data integration approaches, with a particular focus on analyzing high-dimensional correlated data.

Host: Xuming He

Reception with light refreshments after the seminar from 5:00 to 5:45.