Graduate Student Seminar: Kernel Bandwidth Selection for Maximum Mean Discrepancy
Distributional shifts between training and testing data can severely affect the performance of machine learning models, making their detection a critical task. The kernel two-sample test based on maximum mean discrepancy (MMD) is a widely adopted approach for this purpose. However, its effectiveness depends heavily on the choice of kernel bandwidth. This project investigates the influence of bandwidth on MMD performance and offers practical guidance for efficient bandwidth selection. Through extensive simulations, we assess the robustness and sensitivity of both the standard MMD and a Mahalanobis-aggregated variant under a variety of data-generating conditions. Multiple bandwidth selection strategies are evaluated and compared to inform best practices in real-time detection tasks.