23  Mixed-effects regression

Author
Affiliation

Vladimir Buskin

Catholic University of Eichstätt-Ingolstadt

23.2 Preparation

# Load libraries
library(tidyverse)
library(broom)
library(broom.mixed)
library(tidyr)
library(lme4) # for linear mixed-effects models
library(sjPlot)
library(ggeffects)
library(ggpubr)

# Load data
varmorph <- read.csv("varmorph_data.csv", header = TRUE)

# Reduce data
varmorph %>%
  select(rt, target, prime_type, subj_id) %>%
  filter(prime_type != "filler") %>% 
  drop_na() -> varmorph2

# Overview
glimpse(varmorph2)
Rows: 7,038
Columns: 4
$ rt         <dbl> 599.63, 885.39, 1124.94, 568.68, 726.24, 1095.37, 492.96, 6…
$ target     <chr> "print", "defend", "tempt", "hunt", "staple", "pose", "kick…
$ prime_type <chr> "derived", "derived", "derived", "derived", "derived", "der…
$ subj_id    <chr> "3202", "3202", "3202", "3202", "3202", "3202", "3202", "32…

23.3 Multilevel models

At their core, mixed-effects models “are extensions of regression in which data are structured in groups and coefficients can vary by group” (Gelman and Hill 2007: 237). Typical grouping structures found in linguistic data include speakers, regions, or lexical stimuli for which multiple observations are attested. Normally, such structures would violate the assumption of independence, but can be controlled for by capturing group-wise tendencies.

For illustration, a simple example of a hierarchical dataset is presented in Figure 23.1. If one were to, for instance, measure test scores for every student, it may be of interest how their performance varies not only from student to student but also from school to school. After all, the students are nested within their schools.

graph LR
    A1[School 1] --> B11(Student 1)
    A1[School 1] --> B12(Student 2)
    A1[School 1] --> B13(Student 3)
    
    A2[School 2] --> B21(Student 4)
    A2[School 2] --> B22(Student 5)
    A2[School 2] --> B23(Student 6)
    
Figure 23.1
Task

Read the following (partial) description of the experiments conducted by Ciaccio & Veríssimo on the morphological processing of complex lexical items (2022):

Sixty-nine intermediate to advanced non-native speakers of English (54 women; 15 men) took part in the experiment in exchange for payment or course credits. […] The experiment included 102 English monomorphemic verbs used as targets (e.g., print). These were preceded by their -ed past-tense form (e.g., printed) as the inflected prime, their -er nominalization (e.g., printer) as the derived prime, or by an unrelated prime. Unrelated primes were dissimilar in form and meaning from their corresponding targets; half of them were -ed inflected forms and half of them were -er derived words. (Ciaccio and Veríssimo 2022: 2267)

Inspect varmorph2 and characterise its multilevel structure.

23.3.1 Types of mixed-effects models

Variance across groups can be captured by varying-intercept and/or varying-slope models. These varying coefficients also known as random effects (cf. Gelman and Hill (2007): 245). In the model equation, the intercept \(\alpha\) and/or the slope \(\beta\) is additionally indexed for the grouping factor. Let \(J\) denote the number of groups for \(j = 1, ..., J\).

Varying-intercept model

We allow group-wise variation in the intercept by replacing \(\alpha\) with \(\alpha_{j}\) to indicate the intercept for the \(j\)-th group. It is defined as a random variable and follows the normal distribution. For instance, each participant in the aforementioned psycholinguistic would receive its own intercept rather than a global one for all participants.

\[ Y = \alpha_{j} + \beta_1X_{1} + \beta_2X_{2} + ... + \epsilon \qquad \alpha_{j} \sim N(\mu_{\alpha}, \sigma_{\alpha}^2) \tag{23.1}\]

Varying-slope model

We will allow group-wise variation in the slope coefficients by replacing them with \(\beta_{ij}\) to indicate the slope for the \(j\)-th group. The slope now functions as a random variable and is normally distributed. In the psycholinguistic study, each participant would be assigned its own slope coefficient.

\[ Y = \alpha + \beta_{1j}X_{1} + \beta_{2j}X_{2} + ... + \epsilon \qquad \beta_{j} \sim N(\mu_{\beta}, \sigma_{\beta}^2) \tag{23.2}\]

23.3.2 Example

Assume we are predicting test performance by School. Using simulated data, the following series of plots plots illustrate …

  1. School as a fixed effect,

  2. … random intercepts for each School,

  3. … random slopes for each School, and

  4. … random intercepts and slopes for each School.

23.3.3 Linear mixed-effects models

23.3.4 Application in R

23.3.4.1 Varying-intercept model

# Varying intercept model

# Define reference level for "prime_type"
varmorph2$prime_type <- factor(varmorph2$prime_type, levels = c("unrelated", "derived", "inflected"))

# Fit mixed-effects models
varmorph.me <- lmer(rt ~ prime_type + # fixed effect
                      (1 | subj_id) + # let intercept vary by subject
                      (1 | target), # # let intercept vary by target word
                      data = varmorph2)

# Summarise results
summary(varmorph.me)
tab_model(varmorph.me, show.se = TRUE, show.aic = TRUE, show.dev = TRUE)
  rt
Predictors Estimates std. Error CI p
(Intercept) 715.70 12.66 690.89 – 740.52 <0.001
prime type [derived] -31.31 5.01 -41.14 – -21.48 <0.001
prime type [inflected] -33.96 5.01 -43.78 – -24.13 <0.001
Random Effects
σ2 29483.89
τ00 target 3690.56
τ00 subj_id 7693.61
ICC 0.28
N subj_id 69
N target 102
Observations 7038
Marginal R2 / Conditional R2 0.006 / 0.283
Deviance 92860.321
AIC 92855.624
ICC

The intraclass correlation coefficient (ICC) “ranges from \(0\) if the grouping conveys no information to \(1\) if all members of a group are identical” (Gelman and Hill 2007: 258). In other words, it indicates how much of the variance in the outcome can be explained by the grouping factor (e.g. school or participant).

Show the code
# Extract random effects and their standard errors
ranef_obj <- ranef(varmorph.me)  # Extract random effects with conditional variance
se_ranef <- arm::se.ranef(varmorph.me)           # Extract standard errors for random effects

# Prepare a data frame for 'subj_id' random intercepts with confidence intervals
subj_ranef <- ranef_obj$subj_id  # Random effects for subjects
subj_se <- se_ranef$subj_id      # Standard errors for subjects

# Combine random effects and standard errors into a data frame
subj_df <- data.frame(
  subj_id = rownames(subj_ranef),
  intercept = subj_ranef[, "(Intercept)"],
  se = subj_se[, "(Intercept)"],
  conf.low = subj_ranef[, "(Intercept)"] - 1.96 * subj_se[, "(Intercept)"],
  conf.high = subj_ranef[, "(Intercept)"] + 1.96 * subj_se[, "(Intercept)"]
)

# Create the waterfall plot
ggplot(subj_df, aes(x = reorder(subj_id, intercept), y = intercept)) +
  geom_hline(yintercept = 0, linetype = "dashed", col = "grey10") +
  geom_pointrange(aes(ymin = conf.low, ymax = conf.high)) +
  coord_flip() +
  geom_vline(xintercept = 0, linetype = "dashed", col = "grey10") +
  labs(title = "Random Intercepts by Subject",
       x = "Subject ID",
       y = "Random Intercept") +
  theme_minimal()

23.3.4.2 Varying-slope model

# Varying-slope model; replace 0 with 1 if you want the intercept to vary too
varmorph.me2 <- lmer(rt ~ prime_type +
                      (0 + prime_type | subj_id),
                      data = varmorph2)

summary(varmorph.me2)
tab_model(varmorph.me2, show.se = TRUE, show.aic = TRUE, show.dev = TRUE)
  rt
Predictors Estimates std. Error CI p
(Intercept) 715.70 11.11 693.92 – 737.49 <0.001
prime type [derived] -31.31 5.57 -42.23 – -20.39 <0.001
prime type [inflected] -33.96 5.32 -44.38 – -23.53 <0.001
Random Effects
σ2 33126.00
τ00  
τ00  
τ11 subj_id.prime_typeunrelated 7545.01
τ11 subj_id.prime_typederived 8193.86
τ11 subj_id.prime_typeinflected 7385.08
ρ01  
ρ01  
ICC 0.14
N subj_id 69
Observations 7038
Marginal R2 / Conditional R2 0.006 / 0.141
Deviance 93452.160
AIC 93455.369
Show the code
# Extract random slopes
ranef_data1 <- ranef(varmorph.me2)$subj_id

# Extract standard errors
ranef_data1_se <- arm::se.ranef(varmorph.me2)$subj_id

# Convert data into long format
random_effects_df <- ranef_data1 %>%
  as.data.frame() %>% 
  rownames_to_column(var = "subj_id") %>%
  pivot_longer(cols = -subj_id, 
               names_to = "prime_type", 
               values_to = "random_effect")


# Create a data frame for standard errors
se_df <- as.data.frame(ranef_data1_se) %>%
  rownames_to_column(var = "subj_id") %>%
  pivot_longer(cols = -subj_id, 
               names_to = "prime_type", 
               values_to = "se")


# Combine random effects with standard errors
combined_df <- random_effects_df %>%
  left_join(se_df, by = c("subj_id", "prime_type"))

# Calculate confidence intervals
combined_df <- combined_df %>%
  mutate(lower_ci = random_effect - 1.96 * se,
         upper_ci = random_effect + 1.96 * se)


# Dotplots with confidence intervals
ggplot(combined_df, aes(x = subj_id, y = random_effect, col = prime_type)) +
  coord_flip() +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed", col = "grey10") +
  facet_wrap(~ prime_type) +
  geom_errorbar(aes(ymin = lower_ci, ymax = upper_ci), width = 0.2) +
  labs(title = "Random Effect of Prime Type by Subject ID",
       x = "Random Slope",
       y = "Subject ID") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))