Using the dataset MathAchieve
in the MEMSS
package carry out an analysis that treats School
as a random effect. Answer the following questions:
-Which variables should be treated as fixed effects?
-Are differences between schools greater than can be explained by within-school variation?
-Is the relation between SES and MathAch different for different schools? Compare the models using deviance difference tests.
-Discuss how the decision whether to treat `School` as a fixed or a random effect might depend on the purpose of the study.
Minority, SES, Sex and MEANSES can be treated as fixed effects (SES can also be treated as random).
To obtain deviances we use full maximum likelihood estimation (REML=F). First fit an empty model, to see whether there are any differences between schools:
#Load the required packages into the environment.
library(MEMSS)
library(lme4)
#Fit an intercept-only model.
fit1 <- lmer(MathAch ~ 1 + (1 | School), data = MathAchieve, REML=F)
#Look at the summary of the fit object.
summary(fit1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: MathAch ~ 1 + (1 | School)
## Data: MathAchieve
##
## AIC BIC logLik deviance df.resid
## 47121.8 47142.4 -23557.9 47115.8 7182
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.06262 -0.75365 0.02676 0.76070 2.74184
##
## Random effects:
## Groups Name Variance Std.Dev.
## School (Intercept) 8.553 2.925
## Residual 39.148 6.257
## Number of obs: 7185, groups: School, 160
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 12.6371 0.2436 51.87
The between school (2nd level) variance is 8.55, if we fit a new model using within school effects (note that we need to include MEANSES, a between school variable, to make sure that SES is a ‘pure’ within school and MEANSES a ‘pure’ between school effect).
#Fit a model with the within school predictors and MEANSES.
fit2 <- lmer(MathAch ~ Minority + SES + Sex + MEANSES +
(1 | School),
data = MathAchieve, REML=F)
#Look at the summary of the fit object.
summary(fit2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: MathAch ~ Minority + SES + Sex + MEANSES + (1 | School)
## Data: MathAchieve
##
## AIC BIC logLik deviance df.resid
## 46347.3 46395.4 -23166.6 46333.3 7178
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2173 -0.7222 0.0364 0.7629 2.9090
##
## Random effects:
## Groups Name Variance Std.Dev.
## School (Intercept) 2.396 1.548
## Residual 35.886 5.990
## Number of obs: 7185, groups: School, 160
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 12.8298 0.1708 75.132
## MinorityYes -2.7282 0.2032 -13.427
## SES 1.9265 0.1084 17.766
## SexMale 1.2185 0.1607 7.582
## MEANSES 2.8820 0.3652 7.891
##
## Correlation of Fixed Effects:
## (Intr) MnrtyY SES SexMal
## MinorityYes -0.319
## SES -0.024 0.155
## SexMale -0.444 -0.016 -0.051
## MEANSES -0.041 0.148 -0.265 -0.026
#Compare the fits of the intercept-only model and model with within school predictors.
anova(fit1, fit2)
## Data: MathAchieve
## Models:
## fit1: MathAch ~ 1 + (1 | School)
## fit2: MathAch ~ Minority + SES + Sex + MEANSES + (1 | School)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## fit1 3 47122 47142 -23558 47116
## fit2 7 46347 46395 -23167 46333 782.54 4 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that part of the between school variance is explained (8.55-2.40=6.15). The deviance difference between this and the previous model is equal to (47115.8-46333.3=782.5, df=4, p<0.001).
To check whether the relation between SES and MathAch is different for different schools we add a random slope for SES.
#Fit a model with a random slope for MEANSES.
fit3 <- lmer(MathAch ~ Minority + SES + Sex + MEANSES +
(1 + SES | School),
data = MathAchieve, REML=F)
#Look at the summary of the fit object.
summary(fit3)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: MathAch ~ Minority + SES + Sex + MEANSES + (1 + SES | School)
## Data: MathAchieve
##
## AIC BIC logLik deviance df.resid
## 46339.2 46401.1 -23160.6 46321.2 7176
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2264 -0.7202 0.0357 0.7610 2.9104
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## School (Intercept) 2.3318 1.5270
## SES 0.3198 0.5655 -0.65
## Residual 35.7443 5.9787
## Number of obs: 7185, groups: School, 160
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 12.8381 0.1702 75.414
## MinorityYes -2.8004 0.2037 -13.748
## SES 1.9237 0.1177 16.346
## SexMale 1.2097 0.1599 7.566
## MEANSES 3.0462 0.3575 8.520
##
## Correlation of Fixed Effects:
## (Intr) MnrtyY SES SexMal
## MinorityYes -0.322
## SES -0.203 0.139
## SexMale -0.445 -0.012 -0.044
## MEANSES -0.088 0.132 -0.246 -0.023
#Compare the fits of the previous three models.
anova(fit1, fit2, fit3)
## Data: MathAchieve
## Models:
## fit1: MathAch ~ 1 + (1 | School)
## fit2: MathAch ~ Minority + SES + Sex + MEANSES + (1 | School)
## fit3: MathAch ~ Minority + SES + Sex + MEANSES + (1 + SES | School)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## fit1 3 47122 47142 -23558 47116
## fit2 7 46347 46395 -23167 46333 782.543 4 < 2.2e-16 ***
## fit3 9 46339 46401 -23161 46321 12.081 2 0.002381 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that there is some random slope variance (0.32), The deviance difference between this and the previous model is equal to (46333-46321=12, df=2, p=0.002). The best model for this data, based on the deviances, is thus model 3 (with random slope for SES).
School
as a fixed or a random effect might depend on the purpose of the study.If the purpose of the study is to generalize to all schools in a specific region/country, and a random sample of schools was taken, the school variable can be considered random. If the purpose is to compare a specific set of schools, not randomly sampled, school is fixed.
The file nurses.sav holds data from nurses working in wards nested within hospitals (Hox, J.J. (2002). Multilevel analysis: Techniques and applications. Hove: Routledge). It is from a hypothetical study on stress in hospitals where wards were randomly assigned to and experimental or control condition. It contains the variables age
(years), experience
(years), gender
(0=m, 1=f), wardtype
(0=general care, 1=special care), expcon
(0=control, 1=experimental), hospsize
(0=small, 1=medium, 2=large), stress
(likert scale 1-7) and the id variables hospital
, ward
and nurse
.
Load this file into R using the function read_spss()
from the haven
package, grand mean center (centering based on the mean over all hospitals) the variables age
and experience
and answer the following questions:
- Are differences between wards greater than can be explained by ward and nurse varying variables?
- Are differences between hospitals greater than can be explained by hospital, ward and nurse varying
variables?
- Is the relation between experimental condition and mean ward stress-score the same in all hospitals?
- If this relation is different, can the size of the hospital explain (a part of) these different
relations?
#Load the required packages into the environment. Note that we already loaded the lme4 package in the previous exercise so actually we do not need to load it again.
require(haven)
require(lme4)
#Load the data into R
Nurses <- read_spss("nurses.sav")
#Don't forget to set working directory correctly or to specify a different file location.
First, fit an empty (intercept-only) model
fit1 <- lmer(stress ~ 1 + (1 | hospital) + (1 | hospital:ward),
data = Nurses, REML=F)
summary(fit1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: stress ~ 1 + (1 | hospital) + (1 | hospital:ward)
## Data: Nurses
##
## AIC BIC logLik deviance df.resid
## 1950.4 1970.0 -971.2 1942.4 996
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.97546 -0.65825 0.01737 0.67432 3.11281
##
## Random effects:
## Groups Name Variance Std.Dev.
## hospital:ward (Intercept) 0.4888 0.6992
## hospital (Intercept) 0.1621 0.4027
## Residual 0.3013 0.5489
## Number of obs: 1000, groups: hospital:ward, 100; hospital, 25
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 5.0010 0.1081 46.28
We see that most of the variation lies at the ward level (0.49). Next we fit a model with fixed effects (remember to grand mean center age
and experience
). Grand mean centering is done for interpretation purposes. There are other ways of centering variables (group mean centering) that lead to different interpretations of the effects.
#Grand mean center the age and experience variables
Nurses$age <- Nurses$age - mean(Nurses$age)
Nurses$experience <- Nurses$experience - mean(Nurses$experience)
#Fit the fixed-effects model
fit2 <- lmer(stress ~ 1 + age + gender + experience + wardtype + expcon + hospsize +
(1 | hospital) + (1 | hospital:ward),
data = Nurses, REML=F)
summary(fit2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: stress ~ 1 + age + gender + experience + wardtype + expcon +
## hospsize + (1 | hospital) + (1 | hospital:ward)
## Data: Nurses
##
## AIC BIC logLik deviance df.resid
## 1624.4 1673.4 -802.2 1604.4 990
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.63602 -0.69954 -0.00082 0.66296 2.76547
##
## Random effects:
## Groups Name Variance Std.Dev.
## hospital:ward (Intercept) 0.3270 0.5718
## hospital (Intercept) 0.0970 0.3114
## Residual 0.2167 0.4655
## Number of obs: 1000, groups: hospital:ward, 100; hospital, 25
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 5.291313 0.157550 33.585
## age 0.022104 0.002196 10.066
## gender -0.453222 0.034929 -12.975
## experience -0.061640 0.004467 -13.798
## wardtype 0.050839 0.118143 0.430
## expcon -0.699914 0.118143 -5.924
## hospsize 0.458554 0.123925 3.700
##
## Correlation of Fixed Effects:
## (Intr) age gender exprnc wrdtyp expcon
## age 0.004
## gender -0.162 -0.014
## experience -0.009 -0.817 0.028
## wardtype -0.375 -0.007 -0.002 0.008
## expcon -0.375 0.005 -0.004 0.000 0.000
## hospsize -0.629 -0.001 0.003 0.002 0.000 0.000
We see that the variance at all levels decreases, so some of it is explained by hospital, nurse and ward varying variables but not all.
To test whether the relation between experimental condition and mean ward stress-score is the same in all hospitals we add a random effect for experimental condition.
fit3 <- lmer(stress ~ 1 + age + gender + experience + wardtype + expcon + hospsize +
(1 + expcon | hospital) + (1 | hospital:ward),
data = Nurses, REML=F)
summary(fit3)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: stress ~ 1 + age + gender + experience + wardtype + expcon +
## hospsize + (1 + expcon | hospital) + (1 | hospital:ward)
## Data: Nurses
##
## AIC BIC logLik deviance df.resid
## 1598.2 1657.0 -787.1 1574.2 988
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.69124 -0.69547 0.00611 0.63756 2.78015
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## hospital:ward (Intercept) 0.1087 0.3297
## hospital (Intercept) 0.1941 0.4405
## expcon 0.6572 0.8107 -0.54
## Residual 0.2166 0.4654
## Number of obs: 1000, groups: hospital:ward, 100; hospital, 25
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 5.428582 0.147171 36.886
## age 0.022228 0.002193 10.136
## gender -0.454663 0.034853 -13.045
## experience -0.061648 0.004456 -13.835
## wardtype 0.053140 0.072291 0.735
## expcon -0.698592 0.177530 -3.935
## hospsize 0.286090 0.120785 2.369
##
## Correlation of Fixed Effects:
## (Intr) age gender exprnc wrdtyp expcon
## age 0.006
## gender -0.174 -0.014
## experience -0.011 -0.817 0.028
## wardtype -0.245 -0.011 -0.004 0.013
## expcon -0.394 0.003 -0.002 0.000 0.000
## hospsize -0.657 -0.004 0.004 0.004 0.000 0.000
Indeed, the variance of the random slope for experimental condition is 0.66, leading us to assume that the relation between experimental condition and mean ward stress-score is different between hospitals (formally however, we would like to test whether this variance is really diferent from 0).
To see whether part of this variation in slopes can be explained by hospital size we add a so called “cross-level interaction”
fit4 <- lmer(stress ~ 1 + age + gender + experience + wardtype + expcon*hospsize +
(1 + expcon | hospital) + (1 | hospital:ward),
data = Nurses, REML=F)
summary(fit4)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: stress ~ 1 + age + gender + experience + wardtype + expcon *
## hospsize + (1 + expcon | hospital) + (1 | hospital:ward)
## Data: Nurses
##
## AIC BIC logLik deviance df.resid
## 1576.8 1640.6 -775.4 1550.8 987
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.77549 -0.69778 0.01179 0.64256 2.81373
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## hospital:ward (Intercept) 0.1087 0.3297
## hospital (Intercept) 0.1429 0.3781
## expcon 0.1785 0.4225 -0.23
## Residual 0.2166 0.4654
## Number of obs: 1000, groups: hospital:ward, 100; hospital, 25
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 5.690390 0.146266 38.904
## age 0.022297 0.002192 10.172
## gender -0.454752 0.034833 -13.055
## experience -0.061808 0.004452 -13.882
## wardtype 0.053134 0.072289 0.735
## expcon -1.497540 0.169711 -8.824
## hospsize -0.041006 0.131823 -0.311
## expcon:hospsize 0.998447 0.160577 6.218
##
## Correlation of Fixed Effects:
## (Intr) age gender exprnc wrdtyp expcon hospsz
## age 0.011
## gender -0.176 -0.014
## experience -0.014 -0.817 0.028
## wardtype -0.247 -0.011 -0.004 0.013
## expcon -0.380 -0.008 0.000 0.008 0.000
## hospsize -0.721 -0.010 0.005 0.008 0.000 0.302
## expcn:hspsz 0.288 0.015 -0.003 -0.010 0.000 -0.755 -0.400
anova(fit1, fit2, fit3, fit4)
## Data: Nurses
## Models:
## fit1: stress ~ 1 + (1 | hospital) + (1 | hospital:ward)
## fit2: stress ~ 1 + age + gender + experience + wardtype + expcon +
## fit2: hospsize + (1 | hospital) + (1 | hospital:ward)
## fit3: stress ~ 1 + age + gender + experience + wardtype + expcon +
## fit3: hospsize + (1 + expcon | hospital) + (1 | hospital:ward)
## fit4: stress ~ 1 + age + gender + experience + wardtype + expcon *
## fit4: hospsize + (1 + expcon | hospital) + (1 | hospital:ward)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## fit1 4 1950.4 1970.0 -971.18 1942.4
## fit2 10 1624.4 1673.4 -802.18 1604.4 337.997 6 < 2.2e-16 ***
## fit3 12 1598.2 1657.0 -787.08 1574.2 30.210 2 2.754e-07 ***
## fit4 13 1576.8 1640.6 -775.38 1550.8 23.394 1 1.320e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see that the slope variance of experimental condition decreases from 0.66 to 0.18. The coefficient for the cross-level interaction is equal to 0.998, t=6.22 (so probably significant if we formally test it). If we do a deviance difference test on the previous two models (1574.2-1550.8=23.4, df=1, p<0.01) we see that the model with cross-level interaction fits better.
End of Practical
.