# Exercise 2 on correlation

 

# In this exercise we will perform some simulations that illustrate the empirical correlation coefficient and Fisher's z-transform of the empirical correlation coefficient.

# In order to keep it simple, we will assume throughout the exercise that the expected values are 0 and the variances are 1.

 

# Before you start doing the computations, you have to load the "MASS" library into R by the command:

 

library(MASS)

 

 

 

# a)?

# Generate 25 observations (x,y) from the bivariate normal distribution with correlation 0.30 (see slide 19 from the lectures)

# Compute the empirical correlation coefficient and plot the observations:

 

n=25

r=0.30

m=matrix(c(0,0),nrow=2)

S=matrix(c(1,r,r,1),nrow=2)

obs=mvrnorm(n,m,S)

x=obs[,1]

y=obs[,2]

cor(x,y)

plot(x,y)

 

# Repeat the commands a number of time. Note how the empirical correlation coefficient and the plot vary.

# This will help to give you an intuition of how scatter plots may look like when the (true) correlation is 0.30.

 

 

# b)

# Repeat a) for correlation 0.60 and correlation 0.90. Note how the plots look like when the correlation is 0.60 and 0.90.

 

 

# c)

# Repeat a) and b) for n=100 and n=400. Note how the variation in the empirical correlation coefficient depends on the sample size

 

 

 

# d)

# Generate 25 observations (x,y) from the bivariate normal distribution with correlation 0.90.

# Compute the empirical correlation coefficient and Fisher's z-transform (see slide 12 from the lectures).

# Repeat this 1000 times, so that you get 1000 values of the empirical correlation coefficient and Fisher's z-transform:

 

n=25

r=0.90

m=matrix(c(0,0),nrow=2)

S=matrix(c(1,r,r,1),nrow=2)

rho=z=rep(0,1000)

for (i in 1:1000)

{

obs=mvrnorm(n,m,S)

x=obs[,1]

y=obs[,2]

rho[i]=cor(x,y)

z[i]=0.5*log((1+rho[i])/(1-rho[i]))

}

 

# Make histograms of the empirical correlation coefficient and Fisher's z-transform (one at a time)

 

hist(rho)

hist(z)

 

# Which of the two histograms look like a normal distribution?

 

 

# e)

# Optional: Repeat d) for correlation 0.60 and 0.30 and 0. What do you learn from these simulations?