This tutorial is a shortened version of a workshop I gave in February 2023. It focuses mostly on how to fit mixed effects models with R, assuming a prior knowledge of what mixed effects models are. For more technical details, please see the slides for the workshop, which can be found here.
Let’s start by loading the packages we will be using throughout this tutorial:
if (!require("pacman")) install.packages("pacman")
## Loading required package: pacman
pacman::p_load(
tidyverse, # for data wrangling
lme4, # to fit mixed effects models
marginaleffects, # to visualize model results
optimx, # to try different optimizers
broom.mixed # to easily extract model results
)
The most important packages here are tidyverse
, which
allows for easy data wrangling and lme4
1, to fit mixed effects
models.
We will be working with data taken from the American National Election Study (ANES), which is a public opinion survey of American voters fielded every presidential election (and periodically in between).2 The data is publicly available, but I did some cleaning on the combined dataset - which covers multiple elections - to keep only the few variables we will be using throughout this tutorial. You can download the minimal replication data on the Github repository that I created for this project.
Let’s load the data and take a quick peak at it:
anes <- readRDS("./data/anes/anes_recoded.rds"); str(anes)
## 'data.frame': 40081 obs. of 9 variables:
## $ pid : chr "republican" "democrat" "democrat" "democrat" ...
## $ age : int 52 57 61 25 70 67 47 86 52 30 ...
## $ female: num 0 1 1 1 0 0 1 0 1 1 ...
## $ race : chr "white" "black" "black" "white" ...
## $ ft_dem: int 97 60 85 50 70 85 60 85 40 60 ...
## $ ft_rep: int 97 70 70 50 50 70 50 97 70 50 ...
## $ state : chr "OH" "OH" "MI" "LA" ...
## $ year : int 1980 1980 1980 1980 1980 1980 1980 1980 1980 1980 ...
## $ cohort: chr "1920s" "1920s" "1910s" "1950s" ...
Rows correspond to respondents, while columns correspond to answers
to survey questions. The data contain nine variables: pid
,
which is a categorical variable indicating whether the respondent
identifies as a Republican, a Democrat or an independent,
age
, which corresponds to the respondent’s age,
female
, which is coded 1 if the respondent identifies as a
female, 0 otherwise, race
, which is a categorical variable
indicating whether the respondent identifies as black, hispanic, white
or other, ft_dem
and ft_rep
, which indicate
the respondent’s feeling thermometer score toward respectively the
Democratic and Republican parties (on a 0-97 scale, where higher values
indicate warmer feelings), state
, which is a categorical
variable indicating the respondent’s state of residence,
year
indicates the year in which the respondent was
questioned (always tied to a presidential election in my trimmed-down
dataset), and finally cohort
indicates the decade in which
respondents were born.
All of the variables will be used in this tutorial, except
ft_rep
. We will use ft_dem
as our outcome
variable throughout, but I also include ft_rep
in the
dataset to allow you to change the outcome variable and test whether the
patterns that we find for feelings toward the Democratic party also
apply to the Republican party.
Now, before we start fitting mixed effects models, I present a very quick discussion on clustering which justifies the need to use mixed effects models.
Data is often clustered in nature (e.g., respondents nested within cities/provinces/states/countries; students nested within classrooms/schools; repeated observations of the same participants). There are three approaches to treating clustered data: 1) complete pooling (not accounting for clustered nature of data), 2) no pooling (using fixed effects to account for clustering); 3) partial pooling (using random effects to partially account for clustering).
The complete pooling approach has fallen out of favour over the last decades because of its vulnerability to cluster-specific heterogeneity. It assumes that clustering can be ignored, which is generally only realistic when clusters are very specific (e.g., primary sampling units/cities/counties in surveys). Otherwise, if clustering should not be ignored, the complete pooling approach yields biased estimates given its vulnerability to unobserved, cluster-specific confounders.
On the opposite end of the spectrum, you can use a no pooling approach, which fully accounts for cluster-specific heterogeneity. It assumes that unobserved factors that vary across clusters cannot be ignored and accounts for unit-specific unobserved confounders. It allows to make robust inference about the clusters in the sample, but that comes at the cost of not being able to extrapolate beyond the sample. Using fixed effects, our model tends to perform poorly when engaging in out-of-sample predictions.
Fortunately, there’s a middle ground between these two approaches, i.e., partial pooling. It assumes that unobserved confounders that vary across clusters cannot be ignored, but puts a probability distribution on inter-cluster variance. Doing so allows us to make inference about the population of clusters when working with only a sample of them. It also allows us to make more credible inference about the sample of clusters when working with imbalanced designs (uneven number of observations across clusters). The general objective of partial pooling is to allow extrapolation beyond the sample, or provide better in-sample predictions when clusters are imbalanced. This latter approach is the focus of this tutorial.
Let’s specify a simple random intercept model with state as the
clustering variable, using lme4
. The logic is highly
similar to that of lm()
, with the main difference being the
inclusion of our random effects term within parentheses. Inside the
parentheses, we need to first specify the parameters that vary, then
after |
we specify the variable which corresponds to our
clustering unit. In our case, we specify 1 | state
because
we want the intercept to vary by state. Importantly, the clustering
variable must not be included as a fixed effect
parameter.
ri.fit <- lmer(ft_dem ~ 1 + age + female + race +
(1 | state),
data = anes)
summary(ri.fit,
correlation = FALSE) # correlation of FE, of little interest
## Linear mixed model fit by REML ['lmerMod']
## Formula: ft_dem ~ 1 + age + female + race + (1 | state)
## Data: anes
##
## REML criterion at convergence: 374073.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2173 -0.5866 0.1089 0.6985 2.2991
##
## Random effects:
## Groups Name Variance Std.Dev.
## state (Intercept) 14.35 3.789
## Residual 659.80 25.687
## Number of obs: 40081, groups: state, 51
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 71.566408 0.770901 92.835
## age 0.027619 0.007459 3.703
## female 5.397808 0.257880 20.932
## racehispanic -12.933167 0.575063 -22.490
## raceother -21.878788 0.707997 -30.902
## racewhite -26.131678 0.410938 -63.590
The Random effects
part of the output is what’s specific
to mixed effects models. It provides information on the variance of each
random effect we specify, along with the residual variance at the
unit-level (i.e. the error term). Remember that random effects are
assumed to follow a normal distribution, so the variance parameter gives
us useful information on the shape of their distribution. What this is
telling us here is that there is some residual variance at the
state-level, but it is of very small magnitude compared to the residual
variance at the unit-level. In other words, one’s feeling toward the
Democratic party has much more to do with their individual
characteristics rather than the state they live in, which is
intuitive.
The most careful observers will notice in the previous output that we are not given any \(p\)-values. There is a good reason for that. Mixed effects models are a currently evolving line of research and statisticians suggest that the math does not always work out for \(p\)-values of fixed effects parameters in mixed effects models. Simply put, with unbalanced data, the \(t\)-value of coefficients does not follow a \(t\)-distribution. This mostly impacts models that are either complex or have small sample sizes/number of clusters. For relatively simple models with a large sample size and number of clusters, the approximation difference vanishes.
Fortunately, there is a solution if you insist on having \(p\)-values. The package
lmerTest
allows you to obtain them, but based on what I
just mentioned, they are somewhat too optimistic (by how much though
will depend on your data and model). If you want my two cents, I think -
along with the American Statistical Association and the editorial board
of The American Statistician - that we shouldn’t adhere
to strict cut-off values of statistical significance anyways.3 So compute
confidence intervals around your coefficients of interest as you usually
would, and if they almost overlap 0, then you may want to be a bit more
conservative in your interpretation than you typically are. Whether we
like it or not, data science is still more of an art than a science
sometimes.
So, to obtain \(p\)-values, we
simply need to load lmerTest
and re-fit our model:
pacman::p_load(lmerTest)
ri.fit <- lmer(ft_dem ~ 1 + age + female + race +
(1 | state),
data = anes)
summary(ri.fit,
correlation = FALSE)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ft_dem ~ 1 + age + female + race + (1 | state)
## Data: anes
##
## REML criterion at convergence: 374073.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2173 -0.5866 0.1089 0.6985 2.2991
##
## Random effects:
## Groups Name Variance Std.Dev.
## state (Intercept) 14.35 3.789
## Residual 659.80 25.687
## Number of obs: 40081, groups: state, 51
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 7.157e+01 7.709e-01 1.321e+02 92.835 < 2e-16 ***
## age 2.762e-02 7.459e-03 4.007e+04 3.703 0.000213 ***
## female 5.398e+00 2.579e-01 4.003e+04 20.932 < 2e-16 ***
## racehispanic -1.293e+01 5.751e-01 3.893e+04 -22.490 < 2e-16 ***
## raceother -2.188e+01 7.080e-01 3.972e+04 -30.902 < 2e-16 ***
## racewhite -2.613e+01 4.109e-01 3.870e+04 -63.590 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From here onward we will always have \(p\)-values included in our
summary()
outputs since we loaded the lmerTest
package.
We can extract random effects using ranef()
. Here, I
specify drop = TRUE
because I want to have the random
effects as a vector rather than a list, and I ask ranef()
not to give me their conditional variance.
ranef(ri.fit,
drop = TRUE, # to have them as a vector
condVar = FALSE) # I don't want to see their conditional variance
## $state
## AK AL AR AZ CA CO
## -3.07464917 -1.26297978 1.08795722 -3.70950275 1.50813827 0.96262682
## CT DC DE FL GA HI
## 1.96444265 6.65816389 2.75901310 -1.95629794 -2.44979127 -1.16044515
## IA ID IL IN KS KY
## 4.52486037 -8.70644221 1.03627535 1.04953812 -1.14983294 -3.27421770
## LA MA MD ME MI MN
## -2.92352174 7.02646252 1.18426898 -0.05952993 2.48566676 5.06900426
## MO MS MT NC ND NE
## -1.18768420 -4.90346053 -3.51217272 0.72108195 -6.68354831 -2.51506907
## NH NJ NM NV NY OH
## 2.71232031 1.43522945 -0.81261732 -5.66360585 3.11575342 1.12058782
## OK OR PA RI SC SD
## -6.14981248 4.05283134 1.72021597 -1.39243886 -4.37419371 -2.91796297
## TN TX UT VA VT WA
## 0.54850925 -0.91995700 -1.31668321 1.03263543 2.06891206 3.87866081
## WI WV WY
## 1.83008446 6.37899892 4.14417731
To make the visualization more interesting, we can also use a plot.
Using the tidy()
function from the broom.mixed
package allows us to extract model results as a tibble, which is
convenient to work with, among other reasons because we can use it in
ggplot
:
broom.mixed::tidy(ri.fit,
effects = "ran_vals") %>% # to extract only the random effects
ggplot(aes(x = reorder(level, -estimate), y = estimate)) +
geom_hline(yintercept = 0, color = "grey70") +
geom_point(color = "cyan4") +
geom_errorbar(aes(ymin = estimate - 1.96*std.error, ymax = estimate + 1.96*std.error),
width = 0, color = "cyan4") +
labs(x = "State", y = "Random effect") +
scale_x_discrete(guide = guide_axis(n.dodge = 2)) +
theme_classic() +
coord_flip()
An important point to mention about random effects: do not interpret their prediction intervals as confidence intervals. Notions of statistical significance are inappropriate here. Since random effects are not fixed parameters, we can’t say that we “estimate” them. Thus rather than referring to them as estimates, we commonly refer to them as the Best Linear Unbiased Predictors (BLUPs) of the cluster-level variance. Typically, people will just call them “variance components”.
You can also estimate a model with crossed random effects, i.e., random effects for multiple clusters that are not nested within one another. We will do so estimating a random intercept model on our ANES data where there is a variance component at the state-level and another variance component at the cohort-level (i.e., respondents’ decade of birth). These are crossed random effects as respondents from a given state may belong to different cohorts and vice versa.
Importantly, given that we now have two distinct clustering units, we
need to also have two sets of parentheses, where we call inside each set
of parentheses the variables that vary for each clustering unit. Here,
we only have our intercept which varies by state and cohort, but we
could theoretically have more than one variable that varies for each
cluster. Also, the random effects terms do not have to be the same for
each clustering unit, so we could theoretically have the
age
variable varying by state and the female
variable varying by cohort (along with our intercept in each cluster).
The possibilities are endless!
crossed.re.fit <- lmer(ft_dem ~ 1 + age + female + race +
(1 | state) + (1 | cohort),
data = anes)
summary(crossed.re.fit,
correlation = FALSE)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ft_dem ~ 1 + age + female + race + (1 | state) + (1 | cohort)
## Data: anes
##
## REML criterion at convergence: 371990.9
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.6244 -0.6225 0.0787 0.7242 2.4227
##
## Random effects:
## Groups Name Variance Std.Dev.
## state (Intercept) 7.763 2.786
## cohort (Intercept) 194.163 13.934
## Residual 625.672 25.013
## Number of obs: 40081, groups: state, 51; cohort, 13
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 9.405e+01 4.021e+00 1.292e+01 23.39 5.88e-12 ***
## age -3.454e-01 1.097e-02 3.734e+04 -31.48 < 2e-16 ***
## female 5.146e+00 2.513e-01 4.002e+04 20.48 < 2e-16 ***
## racehispanic -1.145e+01 5.608e-01 3.638e+04 -20.42 < 2e-16 ***
## raceother -1.862e+01 6.933e-01 3.931e+04 -26.86 < 2e-16 ***
## racewhite -2.652e+01 3.996e-01 3.624e+04 -66.36 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This time, we see two random effects (intercepts varying by state and also by cohort). The residual variance at the cohort-level is much greater than the variance at the state-level, but this could be deceiving. Given that we have only 13 cohorts, the model is likely to overestimate the cross-cohort variance. Still, the variance at the unit-level remains much greater.
Let’s look at our crossed random effects using a plot, similarly to what we did above:
bind_rows(
predictions(crossed.re.fit,
by = "state") %>%
dplyr::select(state, estimate, conf.low, conf.high) %>%
mutate(re = "State") %>%
rename(value = state),
predictions(crossed.re.fit,
by = "cohort") %>%
dplyr::select(cohort, estimate, conf.low, conf.high) %>%
mutate(re = "Cohort") %>%
rename(value = cohort)
) %>%
ggplot(aes(x = reorder(value, -estimate), y = estimate)) +
geom_point(aes(col = re)) +
geom_errorbar(aes(ymin = conf.low, ymax = conf.high, col = re),
width = 0) +
labs(x = "Random effect", y = "Predicted Democratic FT score") +
scale_color_manual("", values = c("cyan4", "royalblue4")) +
scale_x_discrete(guide = guide_axis(n.dodge = 2)) +
theme_classic() +
theme(legend.position = "none") +
facet_wrap(. ~ re,
scales = "free_y") +
coord_flip()
Let’s come back to our initial random intercept model and add a random slope for age to it, with the slope of age also varying by state.
rs.fit <- lmer(ft_dem ~ 1 + age + female + race +
(1 + age | state),
data = anes)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
## Warning: Model failed to converge with 1 negative eigenvalue: -1.4e+01
Oooops, we have a scary-looking issue! Convergence issues are relatively frequent when using mixed effects models given the fairly intricate computations that they involve. Fortunately, most of the time there exist relatively simple fixes.
The estimation of mixed effects models can run into some issues when using predictors on vastly different scales. This is exactly our case here as we have two categorical predictors (gender and race) and a continuous predictor (age), with the latter being on a very different scale than the former. This complicates the optimization of our likelihood function as the partial derivative of our continuous predictor (age) will dominate the gradient descent process and either make the algorithm very slow to converge or prevent it from converging altogether. Let’s standardize age and see whether that solves the issue.
anes = anes %>%
mutate(age.s = (age - mean(age)) / sd(age))
rs.fit <- lmer(ft_dem ~ 1 + age.s + female + race +
(1 + age.s | state),
data = anes)
summary(rs.fit,
correlation = FALSE)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ft_dem ~ 1 + age.s + female + race + (1 + age.s | state)
## Data: anes
##
## REML criterion at convergence: 374050.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.1773 -0.5924 0.1110 0.7042 2.2798
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## state (Intercept) 13.8916 3.7271
## age.s 0.7129 0.8444 0.86
## Residual 659.3529 25.6779
## Number of obs: 40081, groups: state, 51
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 72.9229 0.6850 87.5798 106.451 <2e-16 ***
## age.s 0.3866 0.1860 32.0935 2.078 0.0458 *
## female 5.3932 0.2578 40028.3769 20.918 <2e-16 ***
## racehispanic -12.9590 0.5731 18448.0355 -22.611 <2e-16 ***
## raceother -21.8547 0.7073 38446.1105 -30.900 <2e-16 ***
## racewhite -26.1399 0.4101 35086.4318 -63.733 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This time, the optimizer could evaluate the gradient thanks to our more “workable” distribution of predictors, which shows why you should always scale your predictors before estimating a mixed effects model. In some case though, that could not be enough and the maximum absolute gradient of your model could still be larger than the default tolerance level (.002), i.e., there is still some room for improvement on the model fit before the algorithm stopped. Let’s see how we could solve that.
There exist many optimization algorithms to maximize a function. They
all have different internal mechanics, some of them being more robust to
complex multivariate functions (i.e., more likely to converge), others
being faster. lme4
can use quite a few. Let’s see if
different optimizers all allow our model to converge. To do so, we need
to use the function allFit()
:
optims <- allFit(rs.fit,
verbose = FALSE) # don't report progress
## Loading required namespace: dfoptim
## boundary (singular) fit: see help('isSingular')
## Warning: Model failed to converge with 1 negative eigenvalue: -2.7e+05
## boundary (singular) fit: see help('isSingular')
## Warning: Model failed to converge with 1 negative eigenvalue: -1.9e+04
# code below was taken from: https://joshua-nugent.github.io/allFit/
optims_OK <- optims[sapply(optims, is, "merMod")]
lapply(optims_OK, function(x) x@optinfo$conv$lme4$messages)
## $bobyqa
## NULL
##
## $Nelder_Mead
## NULL
##
## $nlminbwrap
## NULL
##
## $nmkbw
## NULL
##
## $`optimx.L-BFGS-B`
## [1] "boundary (singular) fit: see help('isSingular')"
##
## $nloptwrap.NLOPT_LN_NELDERMEAD
## [1] "boundary (singular) fit: see help('isSingular')"
##
## $nloptwrap.NLOPT_LN_BOBYQA
## NULL
The output above presents us the warning messages that we get when
using each optimizer to fit our model. NULL
means that we
have no convergence warning. So in our case, all but the fifth and sixth
optimizers provide reliable results.
Let’s estimate our random intercept, random slope model with an
optimizer of our choice. To do so, we need to specify it inside
lmerControl()
:
rs.fit.bob <- lmer(ft_dem ~ 1 + age.s + female + race +
(1 + age.s | state),
control = lmerControl(optimizer = "bobyqa"),
data = anes)
summary(rs.fit.bob,
correlation = FALSE)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: ft_dem ~ 1 + age.s + female + race + (1 + age.s | state)
## Data: anes
## Control: lmerControl(optimizer = "bobyqa")
##
## REML criterion at convergence: 374050.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.1773 -0.5924 0.1110 0.7042 2.2798
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## state (Intercept) 13.8913 3.7271
## age.s 0.7129 0.8443 0.86
## Residual 659.3529 25.6779
## Number of obs: 40081, groups: state, 51
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 72.9229 0.6850 87.5824 106.451 <2e-16 ***
## age.s 0.3866 0.1860 32.0949 2.078 0.0458 *
## female 5.3932 0.2578 40028.3766 20.918 <2e-16 ***
## racehispanic -12.9590 0.5731 18450.3744 -22.611 <2e-16 ***
## raceother -21.8547 0.7073 38446.3034 -30.900 <2e-16 ***
## racewhite -26.1399 0.4101 35086.8181 -63.733 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here, we see two random effects per cluster: a random intercept and a random slope for age. The residual variance for the age slope is much smaller than that for the intercept, but it is expressed on the scale of the fixed effect coefficient, so given the small magnitude of the fixed effect slope for age (.39), a variance of .71 is very meaningful as it implies that some slopes are positive, while others are negative. Important to note: the Corr column, which indicates the correlation of the random effects. Remember that these are assumed to follow a multivariate normal distribution, so this parameter informs us about their estimated covariance.
We can use what people often call a “spaghetti plot” to visualize our results:
# creating a dataset to generate predictions
data.preds = expand.grid(age.s = unique(anes$age.s),
state = unique(anes$state),
female = 1,
race = "hispanic")
# predicting
data.preds$preds = predict(rs.fit.bob,
newdata = data.preds)
# plotting the predictions
ggplot(data.preds, aes(x = age.s, y = preds, group = state)) +
geom_line(color = "cyan4") +
labs(x = "Age (z-score)", y = "Predicted Democratic FT score") +
theme_classic() +
theme(legend.position = "none")
We see a funnel-like shape because the two random effects are
strongly correlated (.86), so the states with the largest intercepts
also tend to have the most positive slopes for age.s
and
vice versa. Substantively, the difference in Democratic party FT scores
across states appears larger among older voters, compared to younger
ones. This could potentially reflect the impact of socialization?
Further, as we noticed in the previous slide, some slopes are positive,
while others are negative. It is up to you to find such pieces of
information and discuss their substantive significance.
When estimating models with many random effects, it can be
interesting to investigate their joint distribution. Doing so can
sometimes provide valuable substantive insights. lme4
always assume them to follow a multivariate normal distribution.
Bayesian approaches allow you to deviate from that assumption, but this
is beyond the scope of this tutorial. Let’s look at an example, with the
model we just estimated.
data.frame(ranef(rs.fit.bob)) %>%
dplyr::select(-c(grpvar, condsd)) %>%
pivot_wider(names_from = term, values_from = condval) %>%
rename(state = grp,
intercept = `(Intercept)`) %>%
ggplot(aes(x = age.s, y = intercept)) +
geom_vline(xintercept = 0, color = "grey70") +
geom_hline(yintercept = 0, color = "grey70") +
geom_point(color = "cyan4") +
labs(x = "RE (age)", y = "RE (intercept)") +
theme_classic()
This plot reflects the correlation of the random effects indicated in
the summary()
output. We can see a very strong positive
correlation between the two random effects, indicating that clusters
with a positive random intercept also tend to have a positive random
slope for age, and vice versa for clusters with negative random
effects.
LRTs allow us to test whether the improved fit of our model - when including additional parameters - is “worth” its added complexity (from a statistical point of view). Importantly, an LRT can only test models that are nested within one another. That is, the parameters of the simpler model must be a subset of the parameters of the expanded model(s). Further, all the parameters of the simpler model must also be included in the expanded model(s). This is what we mean by “models that are nested within one another.”
Considering that random effects cannot be interpreted in terms of \(p\)-values or standard-errors, it is appropriate to use a likelihood ratio test to validate whether their inclusion in a model is statistically warranted. This allows us to test whether the random effects, as a group (and considered as a single parameter), have a variance that is statistically significantly different from zero. The null hypothesis being tested, when comparing a single-level regression with a random intercept model, is that the the variance of the cluster means is equal to zero and thus the random intercepts should not be included in the model:
\[\text{H}_{0}: \sigma^2_{\mu} = 0 \\ \text{H}_{1}: \sigma^2_{\mu} \neq 0\]
A statistically significant result indicates that the added complexity of the more comprehensive model leads to a significant improvement in the model fit (i.e., the improved model accuracy is “worth” the loss of degrees of freedom).
Let’s compare our random intercept model to our model including both
a random intercept and a random slope. To do so, we need to re-fit our
models using maximum likelihood - rather than restricted maximum
likelihood, the default estimation criterion in lme4
- and
use the anova()
function to conduct the LRT:
# re-fit models with ML
ri.fit.ml <- update(ri.fit, REML = FALSE)
rs.fit.bob <- update(rs.fit.bob, REML = FALSE)
# conduct LRT
anova(ri.fit, rs.fit.bob) # put the simpler model first
## refitting model(s) with ML (instead of REML)
## Data: anes
## Models:
## ri.fit: ft_dem ~ 1 + age + female + race + (1 | state)
## rs.fit.bob: ft_dem ~ 1 + age.s + female + race + (1 + age.s | state)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## ri.fit 8 374083 374152 -187033 374067
## rs.fit.bob 10 374069 374155 -187025 374049 17.683 2 0.0001446 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Given the low \(p\)-value registered by our test, we can conclude that the results support our use of a more complex model including both random intercepts and slopes.
Remember our discussion of correlated vs uncorrelated random effects? The choice can be guided by empirics, too. Since uncorrelated random effects models do not include a covariance parameter for random effects, they are actually a simpler version of correlated random effects models, which allows us to use an LRT to compare them:
rs.fit.bob <- update(rs.fit.bob, REML = FALSE)
rs.fit.bob.uncorr <- update(rs.fit.bob.uncorr, REML = FALSE)
anova(rs.fit.bob.uncorr, rs.fit.bob) # put the simpler model first
## Data: anes
## Models:
## rs.fit.bob.uncorr: ft_dem ~ 1 + age.s + female + race + (1 | state) + (0 + age.s | state)
## rs.fit.bob: ft_dem ~ 1 + age.s + female + race + (1 + age.s | state)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## rs.fit.bob.uncorr 9 374077 374154 -187030 374059
## rs.fit.bob 10 374069 374155 -187025 374049 9.7899 1 0.001755 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results again support our use of a more complex model including a covariance parameter for the random effects. So it appears that our more complex model, with a random intercept, a random slope for age and a correlation parameter for the random effects, provides the best fit.
The main limitation I can think of when using lme4
is
its inability to account for autocorrelation, which is a non-negligible
issue when using panel data. The alternative package nlme
allows you to do so.
There are many extensions of mixed effects models that were not
covered here. Most importantly, it is possible to include multiple
random slopes in a model (be careful not to overfit the data, though).
Also, I only focused on models including two levels (units nested within
clusters). Random effects model can also accommodate multiple levels of
clustering (units nested within clusters that are themselves nested
within broader clusters).6 But these models are very statistically
demanding, you need to have a lot of data to estimate them (again, be
careful not to overfit the data). Another important extension is
lme4
’s capacity to accommodate outcome variables following
a large variety of distributions, including discrete and count outcome
variables. These models are known as generalized linear mixed models
(GLMMs) and can also be estimated using lme4
. Finally,
bayesian methods can be useful to further regularize the random effects
and speed up computation.
Have fun using mixed effects models and please reach out to me if you have any comments on this tutorial!
Bates D., Mächler M., Bolker B. and Walker S. (2015). “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software, 67(1), 1–48.↩︎
Wasserstein, Ronald L., Allen L. Schirm and Nicole A. Lazar (2019). “Moving to a World Beyond ‘p<.05’.” The American Statistician, 73(sup1): 1-19.↩︎
For an extended discussion of correlated vs uncorrelated random effects, see https://rpubs.com/yjunechoe/correlationsLMEM.↩︎
Again, see https://rpubs.com/yjunechoe/correlationsLMEM for more details on this. The author does a very good job to simplify a pretty technical topic.↩︎
A typical example of that is the case of students (first level) who are nested within classrooms (second level) that are nested within schools (third level).↩︎