R-help to exercise 2 in ST301 exam spring 2000

 

 

# Read the data into a dataframe, give names to the variables, and inspect the data:

beetle<-read.table("http://www.math.uio.no/avdc/kurs/ST301/data/beetle.dat")
names(beetle)<-c("dose","ant","dode")    # "ant" is number and "dode" is the number killed
beetle

 

# Make sure that the data correspond to those given on page 69 in BS

# (but not that in BS the 10-logarithm of the dose is given)

 

 

# Attach the dataframe

attach(beetle)

 
# Compute the proportion dead at the various doses and plot the proportions versus log10-dose:
proportion<-dode/ant
plot(log10(dose), proportion, pch=3, ylim=c(0,1))
 

# Make sure that the figure corresponds to the one given on page 69 in BS

 
 
# QUESTION a)
# Fit a logistic regression model with log10-dose as covariate and look at the result:
# (Note that since here the outcome is binomial, not binary, the response has to be specified as
# cbind(y,n-y) where n=number of individuals in a group and y=number of "successes" in the group.)
beetle.a<-glm(cbind(dode,ant-dode)~log10(dose),family=binomial)
summary(beetle.a)
 
 
# QUESTION b)
# Another regression model for binomial (and binary) data is to use a so-called complementary log-log link (cf. the exam problem): 
beetle.b<-glm(cbind(dode,ant-dode)~log10(dose), family=binomial(link=cloglog))
summary(beetle.b)
 
 
# To see how the two models fit the data, it is useful to plot the observed proportions and the estimated probabilities for the two models versus log10-dose:
plot(log10(dose), proportion, pch=3, ylim=c(0,1))
points(log10(dose), beetle.a$fit, pch=1)
points(log10(dose), beetle.b$fit, pch=2)
legend(1.82,0.3,c("proportion","modell a","modell b"),pch=c(3,1,2))
 
 
# To see which of the two models has the best fit, we may also look at the residual deviance for the two models. What do these tell you?
 
# Since we have binomial data with fairly many beetles in each group, we get a formal test for goodness-of-fit 
# by comparing the residual deviance with the chi-squard distribution with  8-2=6 degrees of freedom. 
# For the model in a) we get the P-value for such a test by:
1-pchisq(beetle.a$dev, beetle.a$df.res)
 
# Which of the two models gives the best fit to the data?
# What is the problem with the model that has the poorest fit?