Help to exercise 3 in the R-exercises

 

 

# QUESTION b)

 

# Generate 100 standard normally distributed observations of x1, x2, and e with? cor(x1,x2) = 0.5

# This gives 100 observations of ?y = x1 + x2 + e

# Fit the model with x1 and x2 and the model without x2:

 

rho<-0.5

x1<-rnorm(100)

x2<-rho*x1+sqrt(1-rho^2)*rnorm(100)

e<-rnorm(100)

y<-x1+x2+e

summary(lm(y~x1+x2))

summary(lm(y~x1))

 

 

# Look at the estimates for the two models. How do they agree with the theory in question a?

# Discuss the implications this result may have if an important covariate is left out in a regression analysis.

 

 

# Check if the estimate in the model without x2 agrees with the formula given in the introduction to the exercise:

 

cor(x1,x2)

koef<-lm(y~x1+x2)$coef

koef

koef[2]+koef[3]*cor(x1,x2)*(sd(x2)/sd(x1))

lm(y~x1)$coef

 

# Make sure that you understand the computations! Is the result in agreement with the theory?

 

 

# QUESTION c)

 

# Generate new data with cor=0.95 and ?y = x1 - x2 + e.

# Fit linear regression models with each of the covariates separately and with both the covariates:

 

rho=0.95

x1<-rnorm(100)

x2<-rho*x1+sqrt(1-rho^2)*rnorm(100)

e<-rnorm(100)

y<-x1-x2+e

summary(lm(y~x1))

summary(lm(y~x2))

summary(lm(y~x1+x2))

 

# Look at the results for the regression models with only one covariate and for the model with both covariates.

# Discuss the implications these results may have if an important covariate is left out in a regression analysis.

 

 

 

#QUESTION d)

 

# Do 100 simulations of the situation in question b:

rho=0.50

koefsim<-numeric(0)

for (i in 1:100)

{

? ??x1<-rnorm(100)

? ??x2<-rho*x1+sqrt(1-rho^2)*rnorm(100)

??? e<-rnorm(100)

? ??y<-x1+x2+e

? ??koefsim<-rbind(koefsim,lm(y~x1+x2)$coef[2:3])

}

plot(koefsim[,1],koefsim[,2])

cor(koefsim)

 

 

# What is the correlation between the estimates?

# Repeat the simulations with ?rho=0.90, rho=-0.50, and rho=-0.90. ?What do you see?