Pensum/syllabus:
Ch. 1: Introduction
- Big Data and high-dimensional data;
- a statistical versus a machine learning apporach;
Ch. 2: A–B–C
- linear regression;
- variable transformations;
- multivariate responses;
- computational aspects;
- likelihood-based approaches;
- logistic regression.
Ch. 3: Optimism, Conflicts, and Trade-offs
- optimism, overfitting and bias-variance trade-off;
- data split and cross-validation methods;
- bootstrapping;
- information-based criteria;
- methods for variable selection;
- principal component analysis and principal component regression;
- methods of regularization (mainly lasso and ridge regression);
Ch. 4: Prediction of Quantitative Variables
- k-nearest-neighbors and kernel-based methods;
- the curse of dimensionality;
- splines;
- additive models and generalized additive models;
- projection pursuit;
- regression trees;
- neural networks;
Ch. 5: Methods of Classification
- classification and performance evaluation;
- logistic regression;
- classification via linear regression;
- linear discriminant analysis;
- quadratic discriminant analysis;
- regularized appraoches for classification;
- k-nearest-neighbor for classification;
- classification trees;
- neural networks for classification;
- support vector machines;
- bagging, AdaBoost and random forest (only general ideas);
Ch. 6: Methods of Internal Analysis
- cluster analysis;
- distances and dissimilarities;
- nonhierarchical methods;
- hierarchical methods.