Accurate and efficient data point removal for high-dimensional settings
Speaker: Arian Maleki, associate professor in the Department of Statistics at Columbia University.
Abstract: Consider a model trained with p parameters from n independent and identically distributed observations. To assess a data point’s impact on the model, we remove it from the dataset and aim to understand the model’s behavior when trained on the remaining data. This scenario is relevant in various classical and modern applications, including risk estimation, outlier detection, machine unlearning, and data valuation. Conventional approaches involve training the model on the remaining data, but these can be computationally demanding. Consequently, researchers often resort to approximate methods.
This talk highlights that in high-dimensional settings, where p is either larger than n or at the same order, many approximation methods may prove ineffective. We will present and analyze an accurate approximation method tailored for high-dimensional regimes, elucidating the conditions for its accuracy. In the concluding part of the presentation, time permitting, we will briefly discuss some of the unresolved issues in this domain.
Bio: Arian Maleki is an associate professor in the Department of Statistics at Columbia University. Arian’s research interests compressed sensing, computational imaging, machine learning, and high-dimensional statistics. Prior to his work at Columbia, Arian was a postdoctoral scholar at Rice University. He received his PhD from Stanford University, completing a dissertation on the “approximate message passing for compressed sensing”. His BSc degree in Electrical Engineering was completed at Sharif University in Tehran, Iran.