Repetition

Statistical Programming with R

Welcome back

Programme

Day	Time	Purpose/Details	Material	Lecturer
December 10	9:00 - 12:00	Repetition	A	Anne
December 10	13:00 - 15:30	Statistical inference	B	Jolien
December 11	9:00 - 12:00	Linear models	C	Jolien
December 11	13:00 - 15:30	Generalized linear models	D	Anne
December 12	9:00 - 12:00	Data validation and editing	E	Anne
December 12	13:00 - 15:30	Imputation	F	Jolien
December 13	9:00 - 12:00	Evaluation, agreement on summary mission report

Format

Lectures followed by practicals
Materials are on the homepage
- “Impracticals” are the versions without solutions, “Practicals” the ones with
Please do ask questions

What is R?

R is a language and environment for statistical computing and for graphics
GNU project (100% free software)
Managed by the R Foundation for Statistical Computing, Vienna, Austria.
Community-driven
Based on the object-oriented language S (1975)

What is Rstudio?

What is RStudio?

Aggregates all convenient information and procedures into one single place
Allows you to work in projects
Manages your code with highlighting
Gives extra functionality (Shiny, knitr, markdown, LaTeX)
Allows for integration with version control routines, such as Git.

The R community

Huge, active, welcoming online community
- #rstats
- rweekly
- rbloggers
- Stack Overflow
Package development
- About eight packages supplied with base R
- More than 15.000 packages on CRAN

CRAN: Comprehensive R Archive Network

Task views

Using R

Help

Everything that is published CRAN and is aimed at R users, must be accompanied by a help file.
If you know the name of the function that performs an operation, e.g. anova(), then you just type ?anova or help(anova) in the console.
If you do not know the name of the function: type ?? followed by your search criterion. For example ??anova returns a list of all help pages that contain the word ‘anova’

The basics

Write commands directly in the console
Or write code in the editor and submit with Ctrl + Enter
Assign values to objects with <-

a <- c(1, 8, 42, pi, 2^3, 1)
b <- 0:5
c <- rep( c("Hi", "there"), 3)
D <- as.data.frame(cbind(a,b,c))

Objects

Vectors and character vectors

c(1,2,3,4,5)

## [1] 1 2 3 4 5

1:5

## [1] 1 2 3 4 5

as.character(1:5)

## [1] "1" "2" "3" "4" "5"

Objects

Matrices

matrix(1:12, nrow = 3)

##      [,1] [,2] [,3] [,4]
## [1,]    1    4    7   10
## [2,]    2    5    8   11
## [3,]    3    6    9   12

The elements of a vector or matrix must all have the same type

Data frames

Data sets for R

D <- as.data.frame(cbind(a,b,c))
D

##                  a b     c
## 1                1 0    Hi
## 2                8 1 there
## 3               42 2    Hi
## 4 3.14159265358979 3 there
## 5                8 4    Hi
## 6                1 5 there

Alternatively: tibbles or data.tables

Subsetting

D[1,3]

## [1] Hi
## Levels: Hi there

D[1,]

##   a b  c
## 1 1 0 Hi

D[,2]

## [1] 0 1 2 3 4 5
## Levels: 0 1 2 3 4 5

Pipes and the tidyverse

The tidyverse is a collection of packages that “share an underlying design philosophy, grammar, and data structures”. They make for easier data handling and visualization.

The pipe %>% allows for chained method calls that make code much more readable

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.6.1

starwars %>%
  subset( species == "Human" ) %>%
  group_by( homeworld ) %>% 
  summarise( n = n(), mean.height = mean(height))

Plotting

Plots in base R are fast and easy:

plot(starwars$height, starwars$mass)

hist(starwars$height)

ggplot2

library(ggplot2)
starwars %>% ggplot( aes( x = height)) + geom_histogram()

Welcome back

Programme

Format

What is R?

What is R?

What is R?

What is Rstudio?

What is RStudio?

The R community

CRAN: Comprehensive R Archive Network

Task views

Using R

Help

The basics

Objects

Objects

Data frames

Subsetting

Pipes and the tidyverse

Plotting

ggplot2

Practical