exercises for Tue Feb 21
1. On Tue Feb 14 I went somewhat briefly through the basics of regression models and the AIC and AIC^*. We'll be coming back to parts of this in connection with exercises. This is partly in the Aristoteles spirit: "For the things we have to learn before we can do them, we lean by doing them." I also gave a mini-summary of BIC, with more to come next week. We also spent time on the nerve data exam question from 2015, and other details.
2. Exercises for Tue Feb 21: Go to Claeskens and Hjort book website
http://feb.kuleuven.be/public/ndbaf45/modelselection/
and download the dataset on small babies; check the description of the various components. Concentrate on the variables x = weight of mother in kg, z1 = smoke (1 for smokers, 0 for non-smokers), z2 = ftv (number of visits to physician during the first trimester), z3 = indicator for "race" being "white", which you get by writing z3 = 1*(race==1). With y = indicator for the baby being small (smaller than 2500 gram), carry out logistic regression for the 2^3 = 8 candidate models corresponding to putting z1, z2, z3 in and out (but in each of these keep the x). Use glm(y ~ x + z1 + z2 + z3, family=binomial), etc., or make logL programmes yourself. Compute AIC and BIC scores, and discuss your findings. In addition, for each of the 8 candidate models, estimate the probability p that Mrs Jones's baby will be small! Mrs Jones is a white, smoking mother of weight 60 kg, who has not visited her physician. Give for each of the 8 model also a 90% confidence interval for p.