mice package. What data are in the package? (Hint: ?data will give you the help file for the data function). What else is in the package? library(mice)
## Warning: package 'mice' was built under R version 3.6.1
## Warning: package 'lattice' was built under R version 3.6.1
?data
data( package = "mice")
?`mice-package`
ToothGrowth which is part of the datasets package included with base R. Print the data, look at the help file, do a summary. How would you look at whether higher doses of vitamin C were associated wih longer teeth? ?ToothGrowth
head(ToothGrowth)
## len supp dose
## 1 4.2 VC 0.5
## 2 11.5 VC 0.5
## 3 7.3 VC 0.5
## 4 5.8 VC 0.5
## 5 6.4 VC 0.5
## 6 10.0 VC 0.5
summary(ToothGrowth)
## len supp dose
## Min. : 4.20 OJ:30 Min. :0.500
## 1st Qu.:13.07 VC:30 1st Qu.:0.500
## Median :19.25 Median :1.000
## Mean :18.81 Mean :1.167
## 3rd Qu.:25.27 3rd Qu.:2.000
## Max. :33.90 Max. :2.000
table(ToothGrowth$supp, ToothGrowth$dose)
##
## 0.5 1 2
## OJ 10 10 10
## VC 10 10 10
plot( x = ToothGrowth$dose[ ToothGrowth$supp == "VC" ],
y = ToothGrowth$len[ ToothGrowth$supp == "VC" ],
main = "Vitamin C dose and tooth length, a plot",
xlab="Dose", ylab="Tooth length")
The visual inspection here seems pretty convincing to me. But wouldn’t a
ggplot be prettier?
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.1
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.6.1
ToothGrowth %>%
filter(supp == "VC") %>%
ggplot(aes( x = dose, y = len)) +
geom_point( size = 2 )
HairEyeColor. Look at the help file and print the data. Why does it print weird? How do you make a data frame containing only data on female students? Can you compute the number of brown-haired students? Which hair/eye/sex color combination has the fewest students??HairEyeColor
class(HairEyeColor)
## [1] "table"
HairEyeColor
## , , Sex = Male
##
## Eye
## Hair Brown Blue Hazel Green
## Black 32 11 10 3
## Brown 53 50 25 15
## Red 10 10 7 7
## Blond 3 30 5 8
##
## , , Sex = Female
##
## Eye
## Hair Brown Blue Hazel Green
## Black 36 9 5 2
## Brown 66 34 29 14
## Red 16 7 7 7
## Blond 4 64 5 8
HairEyeColor is not just a table, but a 3-dimensional array resulting from cross-tabulating observations on hair color, eye color and sex. Sex makes up the third dimension of the table, so we can get the female students like this:
A <- HairEyeColor[,,"Female"]
class(A)
## [1] "table"
A <- as.data.frame(A)
class(A)
## [1] "data.frame"
Alternatively, you can start by making HairEyeColor into a data.frame and subset it like this:
# Coerce to data.frame
A <- as.data.frame(HairEyeColor)
head(A)
## Hair Eye Sex Freq
## 1 Black Brown Male 32
## 2 Brown Brown Male 53
## 3 Red Brown Male 10
## 4 Blond Brown Male 3
## 5 Black Blue Male 11
## 6 Brown Blue Male 50
A[A$Sex == "Female",]
## Hair Eye Sex Freq
## 17 Black Brown Female 36
## 18 Brown Brown Female 66
## 19 Red Brown Female 16
## 20 Blond Brown Female 4
## 21 Black Blue Female 9
## 22 Brown Blue Female 34
## 23 Red Blue Female 7
## 24 Blond Blue Female 64
## 25 Black Hazel Female 5
## 26 Brown Hazel Female 29
## 27 Red Hazel Female 7
## 28 Blond Hazel Female 5
## 29 Black Green Female 2
## 30 Brown Green Female 14
## 31 Red Green Female 7
## 32 Blond Green Female 8
The latter, rather impenetrable syntax was base R. With dplyr, we can use this (rather prettier) syntax:
library(dplyr)
HairEyeColor %>%
as_data_frame() %>%
filter( Sex == "Female")
## Warning: `as_data_frame()` is deprecated, use `as_tibble()` (but mind the new semantics).
## This warning is displayed once per session.
## # A tibble: 16 x 4
## Hair Eye Sex n
## <chr> <chr> <chr> <dbl>
## 1 Black Brown Female 36
## 2 Brown Brown Female 66
## 3 Red Brown Female 16
## 4 Blond Brown Female 4
## 5 Black Blue Female 9
## 6 Brown Blue Female 34
## 7 Red Blue Female 7
## 8 Blond Blue Female 64
## 9 Black Hazel Female 5
## 10 Brown Hazel Female 29
## 11 Red Hazel Female 7
## 12 Blond Hazel Female 5
## 13 Black Green Female 2
## 14 Brown Green Female 14
## 15 Red Green Female 7
## 16 Blond Green Female 8
## Number of brown-haired students
sum(HairEyeColor["Brown",,])
## [1] 286
## Which group is the smallest?
min(HairEyeColor)
## [1] 2
## Base R
which( HairEyeColor == min(HairEyeColor), arr.ind = TRUE )
## Hair Eye Sex
## Black 1 4 2
## dplyr syntax
HairEyeColor %>% as_tibble() %>% filter(n == min(n))
## # A tibble: 1 x 4
## Hair Eye Sex n
## <chr> <chr> <chr> <dbl>
## 1 Black Green Female 2
a <- seq( from = 0, to = 0.8, by = 0.1)
b <- rep(1:3, 3)
c <- LETTERS[ 1:9 ]
D <- data.frame(fractions = a, numbers = b, letters = c)
D$logs <- log( D$numbers + D$fractions )
hist(D$logs)
save(D, file = "MyData.RDS")
We’re saving the table as an .RDS file. Unless we specify a path elsewhere, it will be saved in our working directory.
getwd()
## [1] "C:/Users/tgw513/Documents/GitHub/Course-Materials-Bosnia/Contents/Material/Part A - Repetition"
You can change your working directory with the setwd function, but for more portable code, we recommend using Rstudio projects instead. Go to the “File” menu, and click “new project”. Place the project file in the folder you want to work in for this project. When you click “open project” and select this project, Rstudio will set the working directory here until you close the project.
End of practical.