Statistics for Corpus Linguists
  • Overview
  • Fundamentals
    • 1.1 Basics
    • 1.2 Research Questions
    • 1.3 Linguistic Variables
    • 1.4 Formal Aspects
  • Introduction to R
    • 2.1 First steps
    • 2.2 Exploring R Studio
    • 2.3 Vectors
    • 2.4 Data frames
    • 2.5 Libraries
    • 2.6 Importing/Exporting
  • NLP
    • 3.1 Concordancing
    • 3.2 Regular expressions
    • 3.3 Data Annotation
  • Statistics
    • 4.1 Data, variables, samples
    • 4.2 Probability theory
    • 4.3 Categorical data
    • 4.4 Continuous data
    • 4.5 Hypothesis testing
    • 4.6 Binomial test
    • 4.7 Chi-squared test
    • 4.8 t-test
  • Models
    • 6.1 Linear regression
    • 6.2 Logistic regression
    • 6.3 Mixed-effects regression
    • 6.4 Poisson regression
    • 6.5 Ordinal regression
  • Machine Learning
    • 7.1 Tree-based methods
    • 7.2 Gradient boosting
    • 7.3 PCA
    • 7.4 EFA
    • 7.5 Clustering
  1. 6. Statistical Modelling
  2. 6.5 Ordinal regression
  • 6. Statistical Modelling
    • 6.1 Linear regression
    • 6.2 Logistic regression
    • 6.3 Mixed-effects regression
    • 6.4 Poisson regression
    • 6.5 Ordinal regression

On this page

  • Suggested reading
  • Introduction
  • Descriptive overview
  • Modelling ordinal data
    • Ordered logistic regression
  • Application in R
    • Using polr() from the MASS library
    • Testing assumptions and goodness of fit
    • Visualisation
    • Using clm() from the ordinal library
    • Mixed-effects ordinal regression
  • Generalised Additive Mixed-effects Models (GAMMs)
    • Suggested reading
    • Rationale
    • Application in R
  1. 6. Statistical Modelling
  2. 6.5 Ordinal regression

6.5 Ordinal regression

Author
Affiliation

Vladimir Buskin

Catholic University of Eichstätt-Ingolstadt

Abstract
Modelling categorical (ordinal) response variables.

Suggested reading

General:

Baguley (2012): Chapter 17.4.5

O’Connell (2006)

Powers and Xie (2008): Chapter 7

Documentation of Cumulative Link Models

Introduction

In her recent contribution, Glass (2021) examines possible reasons why certain transitive verbs have a stronger affinity towards object omission compared to others, placing special emphasis on the routinisation of the actions denoted by the verbs. Specifically, she assesses how high/low-routine contexts affect the acceptability of object omission for transitive verbs from different frequency bins.

We will replicate her findings using her survey data Glass_2021_survey_processed.csv1:

  • 1 The original dataset can be retrieved from Lelia Glass’s OSF repository: https://osf.io/t6zw5 [Last accessed: 27th September, 2024].

  • # Load libraries
    library(tidyverse)
    ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
    ✔ dplyr     1.1.4     ✔ readr     2.1.4
    ✔ forcats   1.0.0     ✔ stringr   1.5.0
    ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
    ✔ lubridate 1.9.2     ✔ tidyr     1.3.1
    ✔ purrr     1.0.4     
    ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
    ✖ dplyr::filter() masks stats::filter()
    ✖ dplyr::lag()    masks stats::lag()
    ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
    library(ordinal)
    
    Attaching package: 'ordinal'
    
    The following object is masked from 'package:dplyr':
    
        slice
    library(MASS)
    
    Attaching package: 'MASS'
    
    The following object is masked from 'package:dplyr':
    
        select
    library(sjPlot)
    Learn more about sjPlot with 'browseVignettes("sjPlot")'.
    library(effects)
    Loading required package: carData
    lattice theme set by effectsTheme()
    See ?effectsTheme for details.
    library(ggeffects)
    library(ggpubr)
    
    # For additional tests
    library(DescTools)
    library(generalhoslem)
    Loading required package: reshape
    
    Attaching package: 'reshape'
    
    The following object is masked from 'package:lubridate':
    
        stamp
    
    The following object is masked from 'package:dplyr':
    
        rename
    
    The following objects are masked from 'package:tidyr':
    
        expand, smiths
    library(brant)
    
    # Load data
    survey <- read.csv("../datasets/Glass_2021_survey_processed.csv")
    
    # Inspect dataset
    str(survey)
    'data.frame':   784 obs. of  5 variables:
     $ rating       : int  3 4 3 3 2 4 3 2 3 2 ...
     $ freq         : chr  "hi" "lo" "lo" "hi" ...
     $ verb         : chr  "pick" "catch" "throw" "break" ...
     $ ParticipantID: chr  "Participant1" "Participant1" "Participant1" "Participant1" ...
     $ routine      : chr  "hi" "lo" "lo" "lo" ...
    head(survey)
      rating freq   verb ParticipantID routine
    1      3   hi   pick  Participant1      hi
    2      4   lo  catch  Participant1      lo
    3      3   lo  throw  Participant1      lo
    4      3   hi  break  Participant1      lo
    5      2   hi  taste  Participant1      hi
    6      4   hi bottle  Participant1      hi
    Short breakdown of the variables
    • routine: In Glass’s study, transitive verbs were randomly assigned to one of the following conditions:
    • (High routine condition:) I worked at my poultry farm. Just like I always do, I butchered some chickens. Then I gathered some eggs.

    • (Low-routine condition:) I visited a friend’s job. Just because people wanted me to try it, I butchered some chickens. Then I went for a walk.

    Cf. Glass (2021: 66)

    unique(survey$routine)
    [1] "hi" "lo"
    • rating records the responses of participants to a follow-up question regarding the acceptability of object omission. The answers are recorded on a 1-5 Likert scale.

    The next time Caroline talks about butchering chickens the day before, how likely do you think she is to say the following?

    ‘I butchered yesterday’

    Cf. Glass (2021: 66)

    unique(survey$rating)
    [1] 3 4 2 5 1
    • verb contains the items to be rated for the conditions in routine
    unique(survey$verb)
    [1] "pick"   "catch"  "throw"  "break"  "taste"  "bottle" "sell"   "chop"  
    • frequency relates to the frequency bins of the verbs:
    unique(survey$freq)
    [1] "hi" "lo"
    • ParticipantID identifies each of the 98 subjects who provided ratings

    Descriptive overview

    Show the code
    ggplot(survey, aes(x = rating, fill = freq)) +
      geom_density(alpha = 0.7) +
      facet_wrap(~freq) +
      theme_minimal() +
      labs(title = "Density of Ratings by Frequency",
           x = "Rating", y = "Density", fill = "Frequency")

    Show the code
    ggplot(survey, aes(x = rating, fill = routine)) +
      geom_density(alpha = 0.7) +
      facet_wrap(~routine) +
      theme_minimal() +
      labs(title = "Density of Ratings by Routine",
           x = "Rating", y = "Density", fill = "Routine")

    Show the code
    library(ggridges)
    ggplot(survey, aes(x = rating, y = verb, fill = verb)) +
      geom_density_ridges(scale = 5, rel_min_height = 0.01, alpha = 0.6) +
      theme_ridges() +
      #theme(legend.position = "none") +
      labs(title = "Distribution of Ratings by Verb",
           x = "Rating", y = "Verb")
    Picking joint bandwidth of 0.381

    Modelling ordinal data

    Our task is clear: We need to measure how routine and freq affect the variability in the acceptability ratings, while controlling for repeated measurements for verb and ParticipantID, which impose a hierarchical structure on the dataset.

    Formally speaking, we have \(p\) explanatory variables \(X_1, X_2, ..., X_p\) for \(1, ..., p\). The target variable, i.e. our \(Y\), is rating with the ordered, discrete outcomes \(y \in \{1, 2, 3, 4, 5\}\).

    The goal is to find a model \(f\) that describes the relationship between \(Y\) and \(X_p\) as accurately as possible and minimises the error term \(\epsilon\):

    \[ Y = f(X_1, X_2, ..., X_p) + \epsilon \tag{1}\]

    Ordered logistic regression

    One family of models that respects the ordered, yet categorical nature of \(Y\) is ordered (or ordinal) logistic regression. Other terms include proportional odds models and cumulative logit/link models.

    Recap: Logistic regression

    Logistic regression is used to model categorical response variables with two or more levels. For instance, let’s assume our \(Y\) is dichotomous with the following two outcomes:

    \[ Y = \begin{cases} \text{yes} \\ \text{no} \end{cases} \tag{2}\]

    Using the logistic function, we can estimate the probability of one outcome versus the other given the predictors \(X_p\). Their log-transformed odds ratio (log odds) is equivalent of the all-too-familiar linear model:

    \[ \log\left(\frac{P(Y = yes \mid X_1, X_2, ..., X_p)}{1 - P(Y = yes \mid X_1, X_2, ..., X_p)}\right) = \beta_0 + \sum_{i=1}^p \beta_iX_i \tag{3}\]

    Core to this approach is the notion of cumulative probabilities. Let \(J\) denote the number of ordered categories in \(Y\). In Glass’s case study, the estimated cumulative probabilities for each ordered outcome (= acceptability rating) would have the forms in Equation 4.

    \[ \begin{array}{rcl} P(Y \leq 1) & = & P(Y = 1) \\ P(Y \leq 2) & = & P(Y = 1) + P(Y = 2) \\ P(Y \leq 3) & = & P(Y = 1) + P(Y = 2) + P(Y = 3) \\ & \vdots & \\ P(Y \leq j) & = & P_1 + ... + P_j \end{array} \tag{4}\]

    We can now update our logistic regression model to take into account cumulative probabilities for \(j = 1, ..., J-1\).

    \[ \log\left(\frac{P(Y \leq j \mid X_1, X_2, ..., X_p)}{1 - P(Y \leq j \mid X_1, X_2, ..., X_p)}\right) = \beta_0 + \sum_{i=1}^p \beta_iX_i \tag{5}\]

    The intercepts \(\beta_{0_j}\) serve as cutpoints between the adjacent ordinal categories. For \(J = 5\) categories, there are \(J - 1 = 4\) cutpoints, i.e.,

    • 1|2 for \(P(Y \leq 1)\)

    • 2|3 for \(P(Y \leq 2)\)

    • 3|4 for \(P(Y \leq 3)\)

    • 4|5 for \(P(Y \leq 4)\).

    Given a change in predictor values, the slope coefficients \(\beta_pX_p\) indicate how the probability of being in a higher rating category changes (Baguley 2012: 691–2).

    We can obtain “regular” probabilities from the cumulative ones by drawing on the equivalence in Equation 6.

    \[ P(Y = j) = P(Y \leq j) - P(Y \leq j - 1) \tag{6}\]

    For instance, the probability \(P(Y = 3)\) is equivalent to

    \[ P(Y = 3) = P(Y \leq 3) - P(Y \leq 2). \tag{7}\]

    Assumptions of proportional odds models

    The proportional odds assumption stipulates a stable effect of the predictors on the (log) odds of the ordinal outcomes across all possible cutpoints (O’Connell 2006: 29). In case of violation, it is better to rely on partial proportional odds models or multinomial logistic regression instead.

    Application in R

    There are several R packages that support ordinal logistic regression models. This section provides an overview of some of the more common (as well as well-documented) implementations.

    Using polr() from the MASS library

    # Convert to factor and determine ordering
    survey$rating <- factor(survey$rating, ordered = TRUE, levels = c("1", "2", "3", "4", "5"))
    
    # Fit polr model
    survey.polr <- polr(rating ~ 
                          freq +
                          routine,
                          data = survey)
    
    # Model summary
    summary(survey.polr)
    
    Re-fitting to get Hessian
    Call:
    polr(formula = rating ~ freq + routine, data = survey)
    
    Coefficients:
                 Value Std. Error  t value
    freqlo    -0.01095     0.1291 -0.08483
    routinelo -0.55521     0.1302 -4.26449
    
    Intercepts:
        Value   Std. Error t value
    1|2 -0.8704  0.1228    -7.0859
    2|3  0.1342  0.1188     1.1290
    3|4  1.2528  0.1293     9.6856
    4|5  2.8915  0.2003    14.4345
    
    Residual Deviance: 2246.662 
    AIC: 2258.662 
    # R-squared and AIC
    PseudoR2(survey.polr, c("Nagelkerke", "AIC"))
      Nagelkerke          AIC 
       0.0244793 2258.6617419 
    tab_model(survey.polr, show.se = TRUE, show.aic = TRUE, show.dev = TRUE, transform = NULL)
    
    Re-fitting to get Hessian
      rating
    Predictors Log-Odds std. Error CI p
    1|2 -0.87 0.12 -1.11 – -0.63 <0.001
    2|3 0.13 0.12 -0.10 – 0.37 0.259
    3|4 1.25 0.13 1.00 – 1.51 <0.001
    4|5 2.89 0.20 2.50 – 3.28 <0.001
    freq [lo] -0.01 0.13 -0.26 – 0.24 0.932
    routine [lo] -0.56 0.13 -0.81 – -0.30 <0.001
    Observations 784
    R2 Nagelkerke 0.024
    Deviance 2246.662
    AIC 2258.662
    Interpreting the model parameters
    • Coefficients: The conditions freqlo (low frequency) and routinelo (low-routine context) both have negative values, which means that both of them decrease the probability of obtaining a higher acceptability rating (compared to freqhi and routinehi).

    • Intercepts: These represent the cutpoints between the ordinal categories, which are necessary for calculating the probabilities of each ordinal category.

    Testing assumptions and goodness of fit

    • Test proportional odds assumption:
    brant(survey.polr) # p < 0.05 is a violation of the assumption
    -------------------------------------------- 
    Test for    X2  df  probability 
    -------------------------------------------- 
    Omnibus     14.45   6   0.02
    freqlo      6.5 3   0.09
    routinelo   8.14    3   0.04
    -------------------------------------------- 
    
    H0: Parallel Regression Assumption holds
    • Hosmer-Lemeshow test, which is essentially a \(\chi^2\)-test:
    logitgof(survey$rating, # observed
             fitted(survey.polr), # expected
             ord = TRUE) # respect ordering
    
        Hosmer and Lemeshow test (ordinal model)
    
    data:  survey$rating, fitted(survey.polr)
    X-squared = 51.173, df = 35, p-value = 0.03808
    • The Lipsitz test is an extension of the Hosmer-Lemeshow test. Note that it requires the response to be a factor.
    lipsitz.test(survey.polr)
    
        Lipsitz goodness of fit test for ordinal response models
    
    data:  formula:  rating ~ freq + routine
    LR statistic = 20.261, df = 9, p-value = 0.01637
    • Part of the same family of tests is the Pulkstenis-Robinson test, which also relies on the \(\chi^2\)-distribution:
    pulkrob.chisq(survey.polr, catvars = c("freq", "routine"))
    
        Pulkstenis-Robinson chi-squared test
    
    data:  formula:  rating ~ freq + routine
    X-squared = 21.476, df = 9, p-value = 0.0107

    Visualisation

    With effects

    # Routine effect plot
    plot(Effect(focal.predictors = c("routine"), mod = survey.polr), rug = FALSE, style="stacked")

    With ggeffects and ggplot2

    Show the code
    # Get the ggeffects data
    eff <- ggeffects::ggeffect(survey.polr, "freq")
    
    # Convert to a data frame
    plot_data <- as.data.frame(eff)
    
    # Ensure the response.level has the desired levels
    plot_data$response.level <- factor(plot_data$response.level, 
                                       levels = c("X1", "X2", "X3", "X4", "X5"),
                                       labels = c("1", "2", "3", "4", "5"))
    
    # Create the plot with confidence intervals
    p1 <- ggplot(plot_data, aes(x = x, y = predicted, color = response.level, group = response.level)) +
      geom_line(linewidth = 1) +
      geom_point(size = 3) +
      geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.1, linewidth = 0.5) +
      scale_color_viridis_d() +
      labs(
        x = "Frequency",
        y = "Predicted Probability",
        color = "Rating"
      ) +
      facet_grid(~ response.level) +
      theme_minimal() +
      theme(
        legend.position = "right",
        panel.grid.minor = element_blank(),
        axis.title = element_text(face = "bold"),
        plot.title = element_text(face = "bold"),
        plot.subtitle = element_text(margin = margin(b = 10))
      ) +
      scale_y_continuous(labels = scales::percent_format(accuracy = 1))
    
    p1

    Show the code
    # Get the ggeffects data for "routine"
    eff_routine <- ggeffects::ggeffect(survey.polr, "routine")
    
    # Convert to a data frame
    plot_data_routine <- as.data.frame(eff_routine)
    
    # Ensure the response.level has the desired levels for "routine"
    plot_data_routine$response.level <- factor(plot_data_routine$response.level, 
                                               levels = c("X1", "X2", "X3", "X4", "X5"),
                                               labels = c("1", "2", "3", "4", "5"))
    
    # Create the second plot for "routine"
    p2 <- ggplot(plot_data_routine, aes(x = x, y = predicted, color = response.level, group = response.level)) +
      geom_line(linewidth = 1) +
      geom_point(size = 3) +
      geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.1, linewidth = 0.5) +
      scale_color_viridis_d() +
      labs(
        x = "Routine",
        y = "Predicted Probability",
        color = "Rating"
      ) +
      facet_grid(~ response.level) +
      theme_minimal() +
      theme(
        legend.position = "right",
        panel.grid.minor = element_blank(),
        axis.title = element_text(face = "bold"),
        plot.title = element_text(face = "bold"),
        plot.subtitle = element_text(margin = margin(b = 10))
      ) +
      scale_y_continuous(labels = scales::percent_format(accuracy = 1))
    
    p2

    Show the code
    # Generate the interaction effects for "freq" and "routine"
    eff_interaction <- ggeffect(survey.polr, terms = c("freq", "routine"))
    
    # Convert to a data frame
    plot_data_interaction <- as.data.frame(eff_interaction)
    
    # Ensure the response.level has the desired levels
    plot_data_interaction$response.level <- factor(plot_data_interaction$response.level, 
                                                   levels = c("X1", "X2", "X3", "X4", "X5"),
                                                   labels = c("1", "2", "3", "4", "5"))
    
    
    # Create the interaction plot with facet by 'x' and color by 'response.level'
    p_interaction <- ggplot(plot_data_interaction, aes(x = group, y = predicted, color = response.level, group = response.level)) +
      geom_line(linewidth = 1) +
      geom_point(size = 3) +
      geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 0.1, linewidth = 0.5) +
      scale_color_viridis_d() +
      labs(
        title = "Predicted Probabilities for Interaction of Frequency and Routine",
        x = "Routine",
        y = "Predicted Probability",
        color = "Rating"
      ) +
      facet_wrap(~ x, labeller = labeller(x = c("hi" = "High Frequency", "lo" = "Low Frequency"))) +  # Facet by "freq"
      theme_minimal() +
      theme(
        legend.position = "right",
        panel.grid.minor = element_blank(),
        axis.title = element_text(face = "bold"),
        plot.title = element_text(face = "bold"),
        plot.subtitle = element_text(margin = margin(b = 10))
      ) +
      scale_y_continuous(labels = scales::percent_format(accuracy = 1))
    
    # Display the interaction plot
    p_interaction

    Using clm() from the ordinal library

    # Convert to factor and determine ordering
    survey$rating <- factor(survey$rating, ordered = TRUE, levels = c("1", "2", "3", "4", "5"))
    
    # Fit cumulative link model
    clm.1 <- ordinal::clm(rating ~ 
                        freq +
                        routine,
                        data = survey, Hess=TRUE)
    
    # Model summary
    summary(clm.1)
    tab_model(clm.1, show.se = TRUE, show.aic = TRUE, show.dev = TRUE, transform = NULL)
      rating
    Predictors Log-Odds std. Error CI p
    1|2 -0.87 0.12 -1.11 – -0.63 <0.001
    2|3 0.13 0.12 -0.10 – 0.37 0.259
    3|4 1.25 0.13 1.00 – 1.51 <0.001
    4|5 2.89 0.20 2.50 – 3.28 <0.001
    freq [lo] -0.01 0.13 -0.26 – 0.24 0.932
    routine [lo] -0.56 0.13 -0.81 – -0.30 <0.001
    Observations 784
    R2 Nagelkerke 0.024
    AIC 2258.662

    Mixed-effects ordinal regression

    Recap: Mixed-effects models

    If the data is nested according to some grouping factor with \(1, ..., k\) groups, we can let the intercept and/or slopes vary by group. For instance, recall the varying-intercept model:

    \[ Y = \alpha_{k} + \beta_1X_{1} + \beta_2X_{2} + ... + \epsilon \qquad \alpha_{k} \sim N(\mu_{\alpha}, \sigma_{\alpha}^2). \] In this case we also speak of random effects.

    # Convert to factor and determine ordering
    survey$rating <- factor(survey$rating, ordered = TRUE, levels = c("1", "2", "3", "4", "5"))
    
    # Fit mixed model with random intercepts for "verb" and "ParticipantID"
    clm.2 <- ordinal::clmm(rating ~ 
                        freq * routine +
                        (1 | verb) +
                        (1 | ParticipantID),
                        data = survey, Hess=TRUE)
    
    # Model summary
    summary(clm.2)
    Cumulative Link Mixed Model fitted with the Laplace approximation
    
    formula: rating ~ freq * routine + (1 | verb) + (1 | ParticipantID)
    data:    survey
    
     link  threshold nobs logLik  AIC     niter     max.grad cond.H 
     logit flexible  784  -991.65 2001.30 598(2386) 3.81e-04 1.5e+02
    
    Random effects:
     Groups        Name        Variance Std.Dev.
     ParticipantID (Intercept) 2.0628   1.4363  
     verb          (Intercept) 0.9059   0.9518  
    Number of groups:  ParticipantID 98,  verb 8 
    
    Coefficients:
                     Estimate Std. Error z value Pr(>|z|)    
    freqlo            -0.1567     0.2138  -0.733 0.463437    
    routinelo         -0.7473     0.2103  -3.553 0.000381 ***
    freqlo:routinelo  -0.1406     0.2981  -0.472 0.637046    
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Threshold coefficients:
        Estimate Std. Error z value
    1|2  -1.3562     0.4020  -3.374
    2|3   0.2062     0.3988   0.517
    3|4   1.8329     0.4053   4.523
    4|5   3.8639     0.4401   8.780
    tab_model(clm.2, show.se = TRUE, show.aic = TRUE, show.dev = TRUE, transform = NULL)
      rating
    Predictors Log-Odds std. Error CI p
    1|2 -1.36 0.40 -2.14 – -0.57 0.001
    2|3 0.21 0.40 -0.58 – 0.99 0.605
    3|4 1.83 0.41 1.04 – 2.63 <0.001
    4|5 3.86 0.44 3.00 – 4.73 <0.001
    freq [lo] -0.16 0.21 -0.58 – 0.26 0.463
    routine [lo] -0.75 0.21 -1.16 – -0.34 <0.001
    freq [lo] × routine [lo] -0.14 0.30 -0.72 – 0.44 0.637
    Random Effects
    σ2 3.29
    τ00 ParticipantID 2.06
    τ00 verb 0.91
    ICC 0.47
    N verb 8
    N ParticipantID 98
    Observations 784
    Marginal R2 / Conditional R2 0.028 / 0.489
    AIC 2001.305
    Show the code
    # Extract random effects
    re_verb <- ranef(clm.2)$verb
    re_participant <- ranef(clm.2)$ParticipantID
    
    # Create dataframes for random effects
    df_verb <- data.frame(verb = rownames(re_verb), re = re_verb[,1])
    df_participant <- data.frame(ParticipantID = rownames(re_participant), re = re_participant[,1])
    
    # Get predictions for an average case
    pred_avg <- ggpredict(clm.2, terms = c("freq", "routine"))
    
    # Add random effects to predictions
    pred_verb <- crossing(pred_avg, df_verb) %>%
      mutate(predicted = predicted + re)
    
    pred_participant <- crossing(pred_avg, df_participant) %>%
      mutate(predicted = predicted + re)
    
    # Create a horizontal dot plot for random effects of participants
    p_caterpillar <- ggplot(df_participant, aes(x = re, y = reorder(ParticipantID, re))) +
      geom_point(size = 3, color = "steelblue3") +  # Dots representing the random effects
      labs(title = "Random Effects for Participants", 
           x = "Random Effect Estimate (log odds)", 
           y = "Participant ID") +
       geom_vline(xintercept = 0, linetype = "dashed", color = "grey20") +
      theme_minimal() +
      theme(panel.grid.major.y = element_blank())  # Removes the gridlines for y-axis
    
    p_caterpillar2 <- ggplot(df_verb, aes(x = re, y = reorder(verb, re))) +
      geom_point(size = 3, color = "steelblue3") +  # Dots representing the random effects
      labs(title = "Random Effects for Verbs", 
           x = "Random Effect Estimate (log odds)", 
           y = "Verb") +
       geom_vline(xintercept = 0, linetype = "dashed", color = "grey20") +
      theme_minimal() +
      theme(panel.grid.major.y = element_blank())  # Removes the gridlines for y-axis
    
    ggarrange(p_caterpillar, p_caterpillar2, ncol = 2, common.legend = TRUE, legend = "right")

    Generalised Additive Mixed-effects Models (GAMMs)

    Suggested reading

    For linguists:

    Baayen & Linke (2020)

    General:

    Hastie & Tibshirani (1991)

    Wood (2006)

    Rationale

    A core assumption of Generalised Linear Models (GLMs) is a linear relationship between predictor(s) and response. If, however, one is interested in exploring potential non-linear trends without the risk of extreme overfitting, GAMs offer an elegant solution: Instead of relying on the linear sum of model coefficients, GAMs estimate more flexible smooth terms \(f_k\) for \(k = 1, ..., p\). For illustration, Equation 8 shows a linear additive model for a continuous target variable with \(p\) predictors.

    \[ Y = \beta_0 + \sum\limits_{k = 1}^p f_k(X_k) \tag{8}\]

    Application in R

    # Load libraries
    library(mgcv)
    Loading required package: nlme
    
    Attaching package: 'nlme'
    The following object is masked from 'package:dplyr':
    
        collapse
    This is mgcv 1.9-1. For overview type 'help("mgcv-package")'.
    library(itsadug)
    Loading required package: plotfunctions
    
    Attaching package: 'plotfunctions'
    The following object is masked from 'package:ggpubr':
    
        get_palette
    The following object is masked from 'package:ggplot2':
    
        alpha
    Loaded package itsadug 2.4 (see 'help("itsadug")' ).
    library(gratia)
    
    # Convert predictors to factors
    survey$ParticipantID <- as.factor(survey$ParticipantID)
    survey$verb <- as.factor(survey$verb)
    
    # Fit GAMM
    gam1 <- bam(as.numeric(rating) ~ # treated as numeric term
                  freq + # linear term
                  routine + # linear term
                  s(ParticipantID, bs = "re") + # smooth term
                  s(verb, bs = "re"), # smooth term
                  data = survey, 
                  family = ocat(R = 5) # number of ordinal categories
                )
    
    # Model summary
    summary(gam1)
    
    Family: Ordered Categorical(-1,0.51,2.09,4.06) 
    Link function: identity 
    
    Formula:
    as.numeric(rating) ~ freq + routine + s(ParticipantID, bs = "re") + 
        s(verb, bs = "re")
    
    Parametric coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept)   0.3613     0.3880   0.931    0.352    
    freqlo       -0.2195     0.1431  -1.534    0.125    
    routinelo    -0.7752     0.1428  -5.430 7.79e-08 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Approximate significance of smooth terms:
                        edf Ref.df      F p-value    
    s(ParticipantID) 77.654     97  4.677  <2e-16 ***
    s(verb)           6.728      7 25.860  <2e-16 ***
    ---
    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    
    Deviance explained = 27.3%
    fREML =   1631  Scale est. = 1         n = 784
    # Extract the intercepts for plotting
    thresh <- gratia::theta(gam1) %>% 
      tibble::as_tibble() %>% 
      setNames(c("threshold"))
    
    # Extract predictions for "routine"
    routine_pred <- ggpredict(gam1, terms = "routine")
    
    # Plot predictions
    routine_pred %>% 
      ggplot(aes(x = x, y = predicted)) +
      geom_point(col = "steelblue") +
      geom_errorbar(aes(ymin = conf.low, ymax = conf.high, width = 0), col = "steelblue") +
      geom_hline(data = thresh, aes(yintercept = threshold), linetype = "dashed") +
      theme_minimal()

    # Extract random effects for "verb"
    verb_pred <- ggpredict(gam1, terms = "verb")
    
    # Plot random effect
    verb_pred %>% 
      ggplot(aes(x = x, y = predicted)) +
      geom_point(col = "steelblue") +
      geom_errorbar(aes(ymin = conf.low, ymax = conf.high, width = 0), col = "steelblue") +
      geom_hline(data = thresh, aes(yintercept = threshold), linetype = "dashed") +
      theme_minimal()

    # Extract random effects for "ParticipantID"
    subj_pred <- ggpredict(gam1, terms = "ParticipantID")
    
    # Plot random effect
    subj_pred |>
      ggplot(aes(x = x, y = predicted)) +
      geom_point(col = "steelblue") +
      geom_errorbar(aes(ymin = conf.low, ymax = conf.high, width = 0), col = "steelblue") +
      #geom_line() +
      geom_hline(data = thresh, aes(yintercept = threshold), linetype = "dashed") +
      theme_minimal()

    References

    Baayen, R. Harald, and Maja Linke. 2020. “Generalized Additive Mixed Models.” In A Practical Handbook of Corpus Linguistics, edited by Magali Paquot and Stefan Thomas Gries, 563–91. Cham: Springer.
    Baguley, Thomas. 2012. Serious Stats: A Guide to Advanced Statistics for the Behavioral Sciences. Houndmills, Basingstoke: Palgrave Macmillan.
    Glass, Lelia. 2021. “English Verbs Can Omit Their Objects When They Describe Routines.” English Language and Linguistics 26 (1): 49–73. https://doi.org/10.1017/S1360674321000022.
    Hastie, Trevor, and Robert Tibshirani. 1991. Generalized Additive Models. London: Chapman & Hall.
    O’Connell, Ann A. 2006. Logistic Regression Models for Ordinal Response Variables. Vol. 146. Thousand Oaks, Calif.: Sage.
    Powers, Daniel A., and Yu Xie. 2008. Statistical Methods for Categorical Data Analysis. 2. ed. Bingley: Emerald.
    Wood, Simon N. 2006. Generalized Additive Models: An Introduction with R. Boca Raton: Chapman & Hall/CRC.
    6.4 Poisson regression