Statistical Programming with R
| Day | Time | Purpose/Details | Material | Lecturer |
|---|---|---|---|---|
| December 10 | 9:00 - 12:00 | Repetition | A | Anne |
| December 10 | 13:00 - 15:30 | Statistical inference | B | Jolien |
| December 11 | 9:00 - 12:00 | Linear models | C | Jolien |
| December 11 | 13:00 - 15:30 | Generalized linear models | D | Anne |
| December 12 | 9:00 - 12:00 | Data validation and editing | E | Anne |
| December 12 | 13:00 - 15:30 | Imputation | F | Jolien |
| December 13 | 9:00 - 12:00 | Evaluation, agreement on summary mission report |
R is a language and environment for statistical computing and for graphics
GNU project (100% free software)
Managed by the R Foundation for Statistical Computing, Vienna, Austria.
Community-driven
Based on the object-oriented language S (1975)
R users, must be accompanied by a help file.anova(), then you just type ?anova or help(anova) in the console.?? followed by your search criterion. For example ??anova returns a list of all help pages that contain the word ‘anova’<-a <- c(1, 8, 42, pi, 2^3, 1)
b <- 0:5
c <- rep( c("Hi", "there"), 3)
D <- as.data.frame(cbind(a,b,c))
Vectors and character vectors
c(1,2,3,4,5)
## [1] 1 2 3 4 5
1:5
## [1] 1 2 3 4 5
as.character(1:5)
## [1] "1" "2" "3" "4" "5"
Matrices
matrix(1:12, nrow = 3)
## [,1] [,2] [,3] [,4] ## [1,] 1 4 7 10 ## [2,] 2 5 8 11 ## [3,] 3 6 9 12
The elements of a vector or matrix must all have the same type
Data sets for R
D <- as.data.frame(cbind(a,b,c)) D
## a b c ## 1 1 0 Hi ## 2 8 1 there ## 3 42 2 Hi ## 4 3.14159265358979 3 there ## 5 8 4 Hi ## 6 1 5 there
Alternatively: tibbles or data.tables
D[1,3]
## [1] Hi ## Levels: Hi there
D[1,]
## a b c ## 1 1 0 Hi
D[,2]
## [1] 0 1 2 3 4 5 ## Levels: 0 1 2 3 4 5
The tidyverse is a collection of packages that “share an underlying design philosophy, grammar, and data structures”. They make for easier data handling and visualization.
The pipe %>% allows for chained method calls that make code much more readable
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.1
starwars %>% subset( species == "Human" ) %>% group_by( homeworld ) %>% summarise( n = n(), mean.height = mean(height))
Plots in base R are fast and easy:
plot(starwars$height, starwars$mass)
hist(starwars$height)
library(ggplot2) starwars %>% ggplot( aes( x = height)) + geom_histogram()