We are going to creat a small dataset with data from four different participants, relating to their response to some form of treatment.
Exercise 1
Create a variable containing the numbers 1-4 in ascending order, using the function. Call the variable ID.
ID <- seq(1,4,1) #alternatively seq(1,4)
Exercise 2
Create a variable containing the numbers 2,2,1,1 in that order, using the function. Call the variable group.
group <- rep(c(2,1), each=2) #alternatively rep(c(2,1), c(2,2))
Exercise 3
Create a variable called response, where possible categories are poor, medium and good. You can decide how each person responds as you want, but remember that this variable should be changed into an ordered factor.
response <- c("medium", "good", "poor", "good") #remember that even though you only have three categories you still have four casesand each of them need a "value".
response <- factor(response, ordered = TRUE, levels = c("poor", "medium", "good"))
Exercise 4
Combine all variables into a data frame. You can call the dataset patientData.
patientData <- data.frame(ID, group, response)
Exercise 5
Print case no. 2, then variable 3, and finally the value of variable 3 only for case no. 2.
#case no 2
patientData[2,]
## ID group response
## 2 2 2 good
#variable no 3
patientData[,3] #alternatively patientData[3] or patientData$response
## [1] medium good poor good
## Levels: poor < medium < good
#case no 2 variable 3
patientData[2,3] #alternatively patientData$response[2]
## [1] good
## Levels: poor < medium < good
Exercise 6
Print all cases except case no 1.
patientData[-1,]
## ID group response
## 2 2 2 good
## 3 3 1 poor
## 4 4 1 good
Part 2
Exercise 1
Based on the information you can get from str(data$variable) and summary(data$variable), can you find out which years you have data from?
str(gapminder$year) #data is collected every 5 years, starting in 1952
## num [1:1704] 1952 1957 1962 1967 1972 ...
summary(gapminder$year) # latest year is 2007.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1952 1966 1980 1980 1993 2007
Exercise 2
What happens if you combine the summary function with as.factor(date$variable) for the variable year? Does this change the variable itself?
Calculate the average population for 2007 group by the life expectancy categories in lifeCat, and count the number of countries you have in each of the categories (hint: n()).