We use the following packages:
library(mice)
library(dplyr)
library(magrittr)
library(DAAG)
The following table shows numbers of occasions when inhibition (i.e., no flow of current across a membrane) occurred within 120 s, for different concentrations of the protein peptide-C. The outcome yes implies that inhibition has occurred.
conc 0.1 0.5 1 10 20 30 50 70 80 100 150
no 7 1 10 9 2 9 13 1 1 4 3
yes 0 0 3 4 0 6 7 0 0 1 7
Rdata <- data.frame(conc = c(.1, .5, 1, 10, 20, 30, 50, 70, 80, 100, 150),
no = c(7, 1, 10, 9, 2, 9, 13, 1, 1, 4, 3),
yes = c(0, 0, 3, 4, 0, 6, 7, 0, 0, 1 ,7))
data
## conc no yes
## 1 0.1 7 0
## 2 0.5 1 0
## 3 1.0 10 3
## 4 10.0 9 4
## 5 20.0 2 0
## 6 30.0 9 6
## 7 50.0 13 7
## 8 70.0 1 0
## 9 80.0 1 0
## 10 100.0 4 1
## 11 150.0 3 7
no and yes over the rows (i.e. no + yes)yes over margin) data <-
data %>%
mutate(margin = no+yes,
prop = yes / margin)
margin as the weightsfit <-
data %$%
glm(prop ~ conc, family=binomial, weights=margin)
summary(fit)
##
## Call:
## glm(formula = prop ~ conc, family = binomial, weights = margin)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8159 -1.0552 -0.6878 0.3667 1.0315
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.32701 0.33837 -3.922 8.79e-05 ***
## conc 0.01215 0.00496 2.450 0.0143 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 16.683 on 10 degrees of freedom
## Residual deviance: 10.389 on 9 degrees of freedom
## AIC: 30.988
##
## Number of Fisher Scoring iterations: 4
fitplot(fit, which = c(1, 5))
The data set is small, but case 11 stands out in the Residuals vs. Leverage plot
conc is somewhat skewed. Run the model again with a log-transformation for conc. To apply this in the model directly:
fit.log <-
data %$%
glm(prop ~ I(log(conc)), family=binomial, weights=margin)
summary(fit.log)
##
## Call:
## glm(formula = prop ~ I(log(conc)), family = binomial, weights = margin)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2510 -1.0599 -0.5029 0.3152 1.3513
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.7659 0.5209 -3.390 0.000699 ***
## I(log(conc)) 0.3437 0.1440 2.387 0.016975 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 16.6834 on 10 degrees of freedom
## Residual deviance: 9.3947 on 9 degrees of freedom
## AIC: 29.994
##
## Number of Fisher Scoring iterations: 4
The logodds now depict the unit increase in log(conc), instead of conc.
log(conc).plot(fit.log, which = c(1, 5))
Outliers are now less of an issue.
moths data frame in the DAAG package contains data from a study of the effect of habitat on the densities of two species of moth (A and P). The quasipoisson distribution is a reasonable model for these data. Try modeling the variable A using this distribution. Is there a better link function than the standard?A.glm <- glm(formula = A ~ log(meters) + factor(habitat), family =
quasipoisson, data = moths)
summary(A.glm)
##
## Call:
## glm(formula = A ~ log(meters) + factor(habitat), family = quasipoisson,
## data = moths)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.5562 -1.4635 -0.0007 0.9248 3.0822
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -15.6958 2096.5884 -0.007 0.994
## log(meters) 0.1292 0.1528 0.845 0.404
## factor(habitat)Disturbed 15.6224 2096.5885 0.007 0.994
## factor(habitat)Lowerside 16.9057 2096.5884 0.008 0.994
## factor(habitat)NEsoak 16.0840 2096.5884 0.008 0.994
## factor(habitat)NWsoak 18.4681 2096.5884 0.009 0.993
## factor(habitat)SEsoak 16.9677 2096.5884 0.008 0.994
## factor(habitat)SWsoak 17.1371 2096.5884 0.008 0.994
## factor(habitat)Upperside 16.7433 2096.5884 0.008 0.994
##
## (Dispersion parameter for quasipoisson family taken to be 2.700801)
##
## Null deviance: 257.108 on 40 degrees of freedom
## Residual deviance: 93.991 on 32 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 13
# Note the huge standard errors
## Consider the square root link as another possibility
A2.glm <- glm(formula = A ~ sqrt(meters) + factor(habitat), family = quasipoisson(link=sqrt), data = moths)
summary(A2.glm)
##
## Call:
## glm(formula = A ~ sqrt(meters) + factor(habitat), family = quasipoisson(link = sqrt),
## data = moths)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.4926 -1.4380 -0.0179 0.8694 3.0841
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.26974 0.88037 -0.306 0.7613
## sqrt(meters) 0.05893 0.06886 0.856 0.3985
## factor(habitat)Disturbed 1.18451 0.89161 1.329 0.1934
## factor(habitat)Lowerside 2.18034 0.86816 2.511 0.0173 *
## factor(habitat)NEsoak 1.41339 0.89452 1.580 0.1239
## factor(habitat)NWsoak 4.86203 0.94902 5.123 1.39e-05 ***
## factor(habitat)SEsoak 2.27990 0.87891 2.594 0.0142 *
## factor(habitat)SWsoak 2.49185 0.95407 2.612 0.0136 *
## factor(habitat)Upperside 1.83905 1.09824 1.675 0.1038
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 2.701906)
##
## Null deviance: 257.108 on 40 degrees of freedom
## Residual deviance: 93.741 on 32 degrees of freedom
## AIC: NA
##
## Number of Fisher Scoring iterations: 10
n <- 1000
x <- runif(n)
sh <- 2
y <- rgamma(n,scale=log(2+3*x)/sh,shape=sh)
Specify the link function and run a glm regressing y on x. Hint: Look at the help file for family. You need to make an object of type “link-glm”, containing the items needed for glm to run: a link function, an inverse link, the derivative of the inverse link, and a function that describes the domain of the inverse link.
## link
linkfun <- function(y) log(exp(y)-1)
## inverse link
linkinv <- function(eta) log(exp(eta)+1)
## derivative of invlink wrt eta
mu.eta <- function(eta) { 1/(exp(-eta) + 1) }
valideta <- function(eta) TRUE
link <- "log(exp(y)-1)"
vv <- structure(list(linkfun = linkfun, linkinv = linkinv,
mu.eta = mu.eta, valideta = valideta,
name = link),
class = "link-glm")
glm(y~x,family=Gamma(link=vv))
##
## Call: glm(formula = y ~ x, family = Gamma(link = vv))
##
## Coefficients:
## (Intercept) x
## 0.2316 1.2625
##
## Degrees of Freedom: 999 Total (i.e. Null); 998 Residual
## Null Deviance: 573.3
## Residual Deviance: 533.4 AIC: 2145
End of Practical