R-help to exercise 14 in BSS

 

 

# Read the data into a dataframe, give names to the variables, and inspect the data:

cancerdata<-read.table("http://www.math.uio.no/avdc/kurs/STK4900/data/cancer.dat")
names(cancerdata)<-c("alder","sig","pyr","cancer")
cancerdata

 

# The data in this exercise are the same as those given on page 80 in BS, but note that on the data file the actual age is given, instead of age-20 as used by BS.

# Note that it is a typo when BS and BSS mention "years of cigarette smoking"; it should be age.

 

 

# Attach the dataframe

attach(cancerdata)

 

 

# Questions 1 & 2)

 

# We first consider the model ?E(Y) = n*exp{b0+b1*s+b2*a}, where
#????? Y=number of cancer cases (=cancer), 
#?????? n=number of person years (= pyr),
# ??????s=number of cigarettes smoked per day (=sig)
#????? ?a = age in years (=alder)
# We may write the model on the form ?E(Y)= exp{1*log(n)+b0+b1*s+b2*a}.
# Note that ?log(n) ?appears as a sort of "covariate" where we know that the regression coefficient takes the value 1. This is called an OFFSET.
 
# We fit the model and look at the result::
cancerfit.1<-glm(cancer~offset(log(pyr))+alder+sig, family=poisson)
summary(cancerfit.1)
 
# Make sure that you understand what the output tells you!.
# Are there significant effects of age and the number of cigarettes smoked?
 
# It is common to report the results of a Poisson regression by means of the rate ratio RR = exp(beta) ?with confidence limits. ?
# R dooes not do this directly, so we define a small function that may make this table.
# (The function may also be used to make tables of odds ratio OR for logistic regression.)
# Copy the function and past it into R::
RRCI<-function(glmobj)
{
? regtab<-summary(glmobj)$coef
? RR<-exp(regtab[,1])
? RRL<-RR*exp(-1.96*regtab[,2])
? RRU<-RR*exp(1.96*regtab[,2])
? cbind(RR,RRL,RRU)
}
 
 
# Use the function to compute rate ratios for age and number of cigarettes (rounded to three decimals)
 
round(RRCI(cancerfit.1),3)
 

# Give an interpretation of what the table tells you about the effect of age and the number of cigarettes smoked

 

 

 

# QUESTION 3)

 

# We then look at a model with second order terms and interaction:

cancerfit.3<-glm(cancer~offset(log(pyr))+ alder+I(alder^2)+sig+I(sig^2)+alder:sig, family=poisson)

 

# Reduce the model by (step-wise) eliminating non-significant covariates.

# (Use Wald tests from the summary-command and/or deviances from the anova-command.)

# Discuss the interpretation of your "final model".

 

 

 

# ADDITIONAL QUESTION:

# Age and the number of cigarettes smoked are reported in intervals. 
# We may alternatively consider these covariates as categorical.
# Such a model is fitted by the command:
cancerfit.a<-glm(cancer~offset(log(pyr))+factor(alder)+factor(sig), family=poisson)

 

# Give an interpretation of this model.

# Discuss how the model may be used to assess the fit of your "final model" from question 3.