Exercises


Again, just like last time it is wise to start with fixing the random seed.

set.seed(123)

  1. Generate two random samples of 10 numbers from a normal distribution with the below specifications. Test the null hypothesis that the population mean is 0.

  1. Write a function that generates a random sample of n numbers from a normal distribution with a user defined mean (i.e. a mean that you can choose when running the function) and standard deviation 1, and returns the p.value for the test that the mean is 0.

  1. Use the function of Exercise 3 to generate 50 \(p\)-values with \(n=10,\mu=0\), and make a qqplot to compare distribution of the \(p\)-values with a uniform \([0,1]\) variable.

In a study that examined the use of acupuncture to treat migraine headaches, consenting patients on a waiting list for treatment for migraine were randomly assigned in a 2:1:1 ratio to acupuncture treatment, a “sham” acupuncture treatment in which needles were inserted at non-acupuncture points, and waiting-list patients whose only treatment was self-administered (Linde et al., 2005). The “sham” acupuncture treatment was described to trial participants as an acupuncture treatment that did not follow the principles of Chinese medicine.


  1. What is the conclusion when the outcome is classified according to numbers of patients who experienced a greater than 50% reduction in headaches over a four-week period, relative to a pre-randomization baseline?

Use the following data

data <- matrix(c(74, 71, 43, 38, 11, 65), nrow = 2, ncol = 3)
colnames(data) <- c("Acupuncture", "Sham", "Waiting list")
rownames(data) <- c("> 50% reduction", "< 50% reduction")
data
##                 Acupuncture Sham Waiting list
## > 50% reduction          74   43           11
## < 50% reduction          71   38           65

  1. Patients who received the acupuncture and sham acupuncture treatments were asked to guess their treatment at the end of their trial. What would you conclude from this data?
data <- matrix(c(82, 17, 30, 30, 26, 16), nrow = 3, ncol = 2)
colnames(data) <- c("Acupuncture", "Sham")
rownames(data) <- c("Chinese", "Other", "Don't know")
data
##            Acupuncture Sham
## Chinese             82   30
## Other               17   26
## Don't know          30   16

  1. Write a function that chooses automatically whether to do the chisq.test() or the fisher.test(). Create the function such that it:

  1. Test the function with the dataset bacteria (from MASS) by testing independence between compliance (hilo) and the presence or absence of disease (y).

  1. Does your function work differently if we only put in the first 25 rows of the bacteria dataset?

End of practical.