STK-IN4300 - Statistical Learning Methods in Data Science

Exercises

All the numbered exercises are from the course book (ESL).

Exercise set 1

Pen-and-paper

Problem 2.3 (hints)
Problem 2.5 (hints)
Problem 2.7(a)-(c)
- Sol_2.3_2.5_2.7_2.9

Coding

Problem 2.8: In the case of classification with linear regression: (i) fit a linear model on the response values 2 and 3, (ii) use a classification rule that assigns class 2 if the predicted response is less than 2.5, and class 3 otherwise. In R, you can use the function?$\texttt{lm()}$?for linear regression and the?$\texttt{knn()}$?function from the R package?$\texttt{class}$.
- Code

Exercise set 2

Pen-and-paper

Problem 2.9 (hints)
Problem 3.3(a) (hints)
- Sol
Problem 3.16 (hints)
- Sol

Coding

We are going to consider a data set consisting of 252 observations of an estimated percentage of body fat along with 13 continuous input variables (age, weight, height and 10 body circumference measurements). You can find the data in edu_bodyfat_both > edu_bodyfat > edu_bodyfat.csv in the unzipped file download from this link. The aim is then to predict the percentage body fat (variable "$\texttt{pcfat}$") based on the input variables using a linear model with subset selection. More specifically, apply best-subset, forward and backward selection and plot the RSS for each method against the number of included predictors and comment on the results. Is there any clear difference between the methods in terms of RSS? Which predictors appear to be the most important?.
- Code

Exercise set 3

Pen-and-paper

Problem 3.13 (hints)
Problem 3.14 (hints)
Problem 3.19: Only consider the ridge estimator. (hints)
Solutions

Coding

Problem 3.17: The dataset can be downloaded from here?(and here are some info?about it). Treat the binary 0/1 spam indicator as a continuous outcome. Summarize the analysis by computing the training and test error for each method.?An indicator variable for splitting the data into a train and test set can be downloaded here. Use e.g. cross-validation to select the value of any tuning parameter. In R, you can use the package?$\texttt{glmnet}$?for ridge/lasso regression and the package?$\texttt{pls}$?for PCR/PLS regression. There are functions for doing cross-validation in both packages.
- Code
Download the handwritten ZIP code data from the?ELS repository?(note that there are separate training and test sets). Use the?glmnet package?to fit a multinomial logistic regression model on the training set using lasso ($\alpha = 1$), ridge regression ($\alpha = 0 $) and elastic net with?$\alpha = 0.5$. Use cross-validation to select a value on the penalty parameter?$\lambda$. Compare the prediction error of the different methods using the test set and comment on the results. Is there any particular pair of digits that appear more difficult to distinguish from each other?
- Code

Exercise set 4

Pen-and-paper

Problem 7.1 (hints)
- Sol
Problem 7.4
Problem 7.5 (hints)
- Sol_74_75
Problem 7.6 (hints)
- Sol

Coding

Problem 7.9: Only consider AIC and BIC at this time. In R, you can use the package?$\texttt{leaps}$?for best subset selection, and for computing BIC and the?$C_p$ statistic (which is equivalent to AIC in the considered setting).
- Code

Exercise set 5

Pen-and-paper

Problem 7.7
- Sol
Problem 5.1 (hints)
- Sol_51_57_59
Problem 5.7
Problem 5.9

Coding

Problem 7.9: Same as in exercise set 4 but now also include the cross-validation and bootstrap estimates.
- Code
Download the bone mineral density data from the ESL repository and recreate Figure 5.6 in the book, that is, fit separate smoothing splines with 12 degrees of freedom for females and males and produce a similar looking plot. In R you can use the function $\texttt{smooth.spline()}$.
- Code

Exercise set 6

Pen-and-paper

Problem 6.2 (hints)
- Sol
Problem 6.8: Skip the classification case. (hints)
- Sol
Extra problem: Naive Bayes
- Sol

Coding

Download the heart disease data?from the?ESL repository?and use kernel density estimation to?estimate (i) global densities and (ii) group-specific densities for the continuous features (that is, separate densities for "CHD" and "no CHD"). Is there any particular features (and/or region of some feature) that appear informative in terms of distinguishing the two classes? Test different bandwidths in order to see how it affects the results. In R you can use the function $\texttt{density()}$ to compute kernel density estimates.
- Code

Exercise set 7

Pen-and-paper

Problem 8.4 (hints)
Problem 9.1
- Sol
Problem 9.2 (hints)
- Sol_9.2_8.4

Coding

Download the spam data from?the?ESL repository?and split the data up into a training set and a test set. Fit (i) a GAM with smoothing splines, (ii) a single classification tree, (iii) bagged classification trees (using both the consensus and probability classification rule) and compare the classification accuracy of the different approaches.??In R, you can use the package?$\texttt{gam}$?for learning GAMs and the package?$\texttt{rpart}$?for trees.
- Code

?Exercise set 8

Pen-and-paper

Problem 15.1
Problem 15.4
Problem 15.5*
- Sol_15.1_15.4_15.5

Coding

Problem 15.6: In R, you can use the package?$\texttt{randomForest}$?for learning random forests.
- Code

Exercise set 9

Pen-and-paper

Problem 10.1
Problem 10.2
Problem 10.8

Coding

Problem 10.4

Exercise set 10

Pen-and-paper

Problem 11.2 (assuming a sigmoid activation function)
Problem 11.3
Problem 11.4

Coding

Problem 11.5 (instead of implementing from scratch, you may also use an existing function for fitting a neural net, for example, $\texttt{nnet}$ in R)

Published Aug. 8, 2025 1:18 PM - Last modified Nov. 14, 2025 1:19 PM