R-help to exercise 22
# Read the data into a dataframe, give names to the variables, and inspect the data:
cancerdata=hers=read.table("http://www.uio.no/studier/emner/matnat/math/STK4900/data/cancer.txt")
names(cancerdata)=c("age","cig","pyr","cancer")
cancerdata
# Make sure that the data are the same as given in the exercise.
# Questions 1 & 2)
# We first consider the model E(Y) = n*exp{b0+b1*s+b2*a}, where
# Y=number of cancer cases (=cancer),
# n=number of person years (= pyr),
# s=number of cigarettes smoked per day (=cig)
# a = age in years (=age)
# We may write the model on the form E(Y)= exp{1*log(n)+b0+b1*s+b2*a}.
# Note that log(n) appears as a sort of "covariate" where we know that the regression coefficient takes the value 1. This is called an OFFSET.
# We fit the model and look at the result::
cancerfit.1=glm(cancer~offset(log(pyr))+age+cig, data=cancerdata, family=poisson)
summary(cancerfit.1)
# Make sure that you understand what the output tells you!.
# Are there significant effects of age and the number of cigarettes smoked?
# It is common to report the results of a Poisson regression by means of the rate ratio RR = exp(beta) with confidence limits.
# To this end we may use the function expcoef from exercise 18.
# Use the function to compute rate ratios for age and number of cigarettes:
expcoef(cancerfit.1)
# Give an interpretation of what the table tells you about the effect of age and the number of cigarettes smoked
# QUESTION 3)
# We then look at a model with second order terms and interaction:
cancerfit.3=glm(cancer~offset(log(pyr))+ age+I(age^2)+cig+I(cig^2)+age:cig, data=cancerdata, family=poisson)
# Reduce the model by (step-wise) eliminating non-significant covariates.
# (Use Wald tests from the summary-command and/or deviances from the anova-command.)
# Discuss the interpretation of your "final model".
# ADDITIONAL QUESTION:
# Age and the number of cigarettes smoked are reported in intervals.
# We may alternatively consider these covariates as categorical.
# Such a model is fitted by the command:
cancerfit.a=glm(cancer~offset(log(pyr))+factor(age)+factor(cig), data=cancerdata, family=poisson)
# Give an interpretation of this model.
# Discuss how the model may be used to assess the fit of your "final model" from question 3.