Exercises

  1. Open Rstudio, read in the mice package. What data are in the package? (Hint: ?data will give you the help file for the data function). What else is in the package?
library(mice)
## Warning: package 'mice' was built under R version 3.6.1
## Warning: package 'lattice' was built under R version 3.6.1
?data
data( package = "mice")
?`mice-package`

  1. Look at the data frame ToothGrowth which is part of the datasets package included with base R. Print the data, look at the help file, do a summary. How would you look at whether higher doses of vitamin C were associated wih longer teeth?
?ToothGrowth

head(ToothGrowth)
##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5
summary(ToothGrowth)
##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000
table(ToothGrowth$supp, ToothGrowth$dose)
##     
##      0.5  1  2
##   OJ  10 10 10
##   VC  10 10 10
plot( x = ToothGrowth$dose[ ToothGrowth$supp == "VC" ], 
      y = ToothGrowth$len[ ToothGrowth$supp == "VC" ],
      main = "Vitamin C dose and tooth length, a plot",
      xlab="Dose", ylab="Tooth length")

The visual inspection here seems pretty convincing to me. But wouldn’t a ggplot be prettier?

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.1
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.1
ToothGrowth %>% 
  filter(supp == "VC") %>% 
  ggplot(aes( x = dose, y = len)) +
  geom_point( size = 2 ) 


  1. Now look at the table HairEyeColor. Look at the help file and print the data. Why does it print weird? How do you make a data frame containing only data on female students? Can you compute the number of brown-haired students? Which hair/eye/sex color combination has the fewest students?
?HairEyeColor
class(HairEyeColor)
## [1] "table"
HairEyeColor
## , , Sex = Male
## 
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    32   11    10     3
##   Brown    53   50    25    15
##   Red      10   10     7     7
##   Blond     3   30     5     8
## 
## , , Sex = Female
## 
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    36    9     5     2
##   Brown    66   34    29    14
##   Red      16    7     7     7
##   Blond     4   64     5     8

HairEyeColor is not just a table, but a 3-dimensional array resulting from cross-tabulating observations on hair color, eye color and sex. Sex makes up the third dimension of the table, so we can get the female students like this:

A <- HairEyeColor[,,"Female"]
class(A)
## [1] "table"
A <- as.data.frame(A)
class(A)
## [1] "data.frame"

Alternatively, you can start by making HairEyeColor into a data.frame and subset it like this:

# Coerce to data.frame
A <- as.data.frame(HairEyeColor) 
head(A)
##    Hair   Eye  Sex Freq
## 1 Black Brown Male   32
## 2 Brown Brown Male   53
## 3   Red Brown Male   10
## 4 Blond Brown Male    3
## 5 Black  Blue Male   11
## 6 Brown  Blue Male   50
A[A$Sex == "Female",]
##     Hair   Eye    Sex Freq
## 17 Black Brown Female   36
## 18 Brown Brown Female   66
## 19   Red Brown Female   16
## 20 Blond Brown Female    4
## 21 Black  Blue Female    9
## 22 Brown  Blue Female   34
## 23   Red  Blue Female    7
## 24 Blond  Blue Female   64
## 25 Black Hazel Female    5
## 26 Brown Hazel Female   29
## 27   Red Hazel Female    7
## 28 Blond Hazel Female    5
## 29 Black Green Female    2
## 30 Brown Green Female   14
## 31   Red Green Female    7
## 32 Blond Green Female    8

The latter, rather impenetrable syntax was base R. With dplyr, we can use this (rather prettier) syntax:

library(dplyr)

HairEyeColor %>% 
  as_data_frame() %>% 
  filter( Sex == "Female")
## Warning: `as_data_frame()` is deprecated, use `as_tibble()` (but mind the new semantics).
## This warning is displayed once per session.
## # A tibble: 16 x 4
##    Hair  Eye   Sex        n
##    <chr> <chr> <chr>  <dbl>
##  1 Black Brown Female    36
##  2 Brown Brown Female    66
##  3 Red   Brown Female    16
##  4 Blond Brown Female     4
##  5 Black Blue  Female     9
##  6 Brown Blue  Female    34
##  7 Red   Blue  Female     7
##  8 Blond Blue  Female    64
##  9 Black Hazel Female     5
## 10 Brown Hazel Female    29
## 11 Red   Hazel Female     7
## 12 Blond Hazel Female     5
## 13 Black Green Female     2
## 14 Brown Green Female    14
## 15 Red   Green Female     7
## 16 Blond Green Female     8
## Number of brown-haired students
sum(HairEyeColor["Brown",,])
## [1] 286
## Which group is the smallest?
min(HairEyeColor)
## [1] 2
## Base R
which( HairEyeColor == min(HairEyeColor), arr.ind = TRUE )
##       Hair Eye Sex
## Black    1   4   2
## dplyr syntax
HairEyeColor %>% as_tibble() %>% filter(n == min(n))
## # A tibble: 1 x 4
##   Hair  Eye   Sex        n
##   <chr> <chr> <chr>  <dbl>
## 1 Black Green Female     2

  1. Make a vector containing the numbers 0 to 0.8 by 0.1. Make a vector repeating the numbers 1,2,3 three times. Make a vector containing the letters A to I. Make a data frame out of these three vectors. Name the columns of the data frame - say, “fractions”, “numbers” and “letters”. Add a column containing the log of the sum of “numbers” and “fractions”. Make a histogram of the logs.
a <- seq( from = 0, to = 0.8, by = 0.1)
b <- rep(1:3, 3)
c <- LETTERS[ 1:9 ]

D <- data.frame(fractions = a, numbers = b, letters = c)
D$logs <- log( D$numbers + D$fractions )

hist(D$logs)


  1. Save a copy of your data frame to the disk. Where is it saved? Where is your working directory? If it isn’t where you want it, move it - say, to a new folder designated for this course
save(D, file = "MyData.RDS")

We’re saving the table as an .RDS file. Unless we specify a path elsewhere, it will be saved in our working directory.

getwd()
## [1] "C:/Users/tgw513/Documents/GitHub/Course-Materials-Bosnia/Contents/Material/Part A - Repetition"

You can change your working directory with the setwd function, but for more portable code, we recommend using Rstudio projects instead. Go to the “File” menu, and click “new project”. Place the project file in the folder you want to work in for this project. When you click “open project” and select this project, Rstudio will set the working directory here until you close the project.


End of practical.