R-hjelp til oppgave 2 ST301 eksamen V2000 �C STK4900 - Spring 2009 �C Universitetet i Oslo

STK4900 - Spring 2009

R-help to exercise 2 in ST301 exam spring 2000

# Read the data into a dataframe, give names to the variables, and inspect the data:

beetle<-read.table("http://www.math.uio.no/avdc/kurs/ST301/data/beetle.dat")

names(beetle)<-c("dose","ant","dode")    # "ant" is number and "dode" is the number killed

beetle

# Make sure that the data correspond to those given on page 69 in BS

# (but not that in BS the 10-logarithm of the dose is given)

# Attach the dataframe

attach(beetle)

# Compute the proportion dead at the various doses and plot the proportions versus log10-dose:

proportion<-dode/ant

plot(log10(dose), proportion, pch=3, ylim=c(0,1))

# Make sure that the figure corresponds to the one given on page 69 in BS

# QUESTION a)

# Fit a logistic regression model with log10-dose as covariate and look at the result:

# (Note that since here the outcome is binomial, not binary, the response has to be specified as

# cbind(y,n-y) where n=number of individuals in a group and y=number of "successes" in the group.)

beetle.a<-glm(cbind(dode,ant-dode)~log10(dose),family=binomial)

summary(beetle.a)

# QUESTION b)

# Another regression model for binomial (and binary) data is to use a so-called complementary log-log link (cf. the exam problem):

beetle.b<-glm(cbind(dode,ant-dode)~log10(dose), family=binomial(link=cloglog))

summary(beetle.b)

# To see how the two models fit the data, it is useful to plot the observed proportions and the estimated probabilities for the two models versus log10-dose:

plot(log10(dose), proportion, pch=3, ylim=c(0,1))

points(log10(dose), beetle.a$fit, pch=1)

points(log10(dose), beetle.b$fit, pch=2)

legend(1.82,0.3,c("proportion","modell a","modell b"),pch=c(3,1,2))

# To see which of the two models has the best fit, we may also look at the residual deviance for the two models. What do these tell you?

# Since we have binomial data with fairly many beetles in each group, we get a formal test for goodness-of-fit

# by comparing the residual deviance with the chi-squard distribution with  8-2=6 degrees of freedom.

# For the model in a) we get the P-value for such a test by:

1-pchisq(beetle.a$dev, beetle.a$df.res)

# Which of the two models gives the best fit to the data?

# What is the problem with the model that has the poorest fit?