Topics in Statistics

STATISTICS AND DATA SCIENCE 496

Novel scientific discoveries are made nowadays by analyzing increasingly large and noisy biological datasets thanks to next-generation high-throughput technology. Machine learning methods which have been developed to extract complex patterns from image, text and speech datasets are now regularly being utilized to investigate conjectures in biology and medicine. The goal of this course will be to review some key concepts and methods in statistical learning and apply these to biological datasets. The course's focus is on the methods and applications rather than the theory and is intended for a broad audience. We will first explore predictive algorithms which perform classification and regression based on training datasets, such as logistic regression, decision trees, random forests, boosting, naive Bayes classifiers, Gaussian process regression, linear discriminant analysis and support vector machines. As much as time allows it, we will then review clustering algorithms and dimensionality reduction techniques used to identify patterns in large-scale biological datasets, such as hierarchical clustering, mixture models and principal component analysis. Prerequisites: SDS/Math 3200 or SDS/Math 3211, or a strong performance in SDS 2200 or Math 2200 and permission of the instructor.
Course Attributes: FA NSM; AS NSM

Section 01

Topics in Statistics
INSTRUCTOR: Katsianos
View Course Listing - FL2024
View Course Listing - SP2025