In her recent contribution, Glass (2021) examines possible reasons why certain transitive verbs have a stronger affinity towards object omission compared to others, placing special emphasis on the routinisation of the actions denoted by the verbs. Specifically, she assesses how high/low-routine contexts affect the acceptability of object omission for transitive verbs from different frequency bins.
We will replicate her findings using her survey data Glass_2021_survey_processed.csv1:
1 The original dataset can be retrieved from Lelia Glass’s OSF repository: https://osf.io/t6zw5 [Last accessed: 27th September, 2024].
rating freq verb ParticipantID routine
1 3 hi pick Participant1 hi
2 4 lo catch Participant1 lo
3 3 lo throw Participant1 lo
4 3 hi break Participant1 lo
5 2 hi taste Participant1 hi
6 4 hi bottle Participant1 hi
Short breakdown of the variables
routine: In Glass’s study, transitive verbs were randomly assigned to one of the following conditions:
(High routine condition:) I worked at my poultry farm. Just like I always do, I butchered some chickens. Then I gathered some eggs.
(Low-routine condition:) I visited a friend’s job. Just because people wanted me to try it, I butchered some chickens. Then I went for a walk.
rating records the responses of participants to a follow-up question regarding the acceptability of object omission. The answers are recorded on a 1-5 Likert scale.
The next time Caroline talks about butchering chickens the day before, how likely do you think she is to say the following?
frequency relates to the frequency bins of the verbs:
unique(survey$freq)
[1] "hi" "lo"
ParticipantID identifies each of the 98 subjects who provided ratings
24.3 Descriptive overview
Show the code
ggplot(survey, aes(x = rating, fill = freq)) +geom_density(alpha =0.7) +facet_wrap(~freq) +theme_minimal() +labs(title ="Density of Ratings by Frequency",x ="Rating", y ="Density", fill ="Frequency")
Show the code
ggplot(survey, aes(x = rating, fill = routine)) +geom_density(alpha =0.7) +facet_wrap(~routine) +theme_minimal() +labs(title ="Density of Ratings by Routine",x ="Rating", y ="Density", fill ="Routine")
Show the code
library(ggridges)ggplot(survey, aes(x = rating, y = verb, fill = verb)) +geom_density_ridges(scale =5, rel_min_height =0.01, alpha =0.6) +theme_ridges() +#theme(legend.position = "none") +labs(title ="Distribution of Ratings by Verb",x ="Rating", y ="Verb")
24.4 Modelling ordinal data
Our task is clear: We need to measure how routine and freq affect the variability in the acceptability ratings, while controlling for repeated measurements for verb and ParticipantID, which impose a hierarchical structure on the dataset.
Formally speaking, we have \(p\) explanatory variables \(X_1, X_2, ..., X_p\) for \(1, ..., p\). The target variable, i.e. our \(Y\), is rating with the ordered, discrete outcomes \(y \in \{1, 2, 3, 4, 5\}\).
The goal is to find a model \(f\) that describes the relationship between \(Y\) and \(X_p\) as accurately as possible and minimises the error term \(\epsilon\):
\[
Y = f(X_1, X_2, ..., X_p) + \epsilon
\tag{24.1}\]
24.4.1 Ordered logistic regression
One family of models that respects the ordered, yet categorical nature of \(Y\) is ordered (or ordinal) logistic regression. Other terms include proportional odds models and cumulative logit/link models.
Recap: Logistic regression
Logistic regression is used to model categorical response variables with two or more levels. For instance, let’s assume our \(Y\) is dichotomous with the following two outcomes:
\[
Y =
\begin{cases}
\text{yes} \\
\text{no}
\end{cases}
\tag{24.2}\]
Using the logistic function, we can estimate the probability of one outcome versus the other given the predictors \(X_p\). Their log-transformed odds ratio (log odds) is equivalent of the all-too-familiar linear model:
Core to this approach is the notion of cumulative probabilities. Let \(J\) denote the number of ordered categories in \(Y\). In Glass’s case study, the estimated cumulative probabilities for each ordered outcome (= acceptability rating) would have the forms in Equation 24.4.
The intercepts \(\beta_{0_j}\) serve as cutpoints between the adjacent ordinal categories. For \(J = 5\) categories, there are \(J - 1 = 4\) cutpoints, i.e.,
1|2 for \(P(Y \leq 1)\)
2|3 for \(P(Y \leq 2)\)
3|4 for \(P(Y \leq 3)\)
4|5 for \(P(Y \leq 4)\).
Given a change in predictor values, the slope coefficients \(\beta_pX_p\) indicate how the probability of being in a higher rating category changes (Baguley 2012: 691–2).
We can obtain “regular” probabilities from the cumulative ones by drawing on the equivalence in Equation 24.6.
The proportional odds assumption stipulates a stable effect of the predictors on the (log) odds of the ordinal outcomes across all possible cutpoints (O’Connell 2006: 29). In case of violation, it is better to rely on partial proportional odds models or multinomial logistic regression instead.
24.5 Application in R
There are several R packages that support ordinal logistic regression models. This section provides an overview of some of the more common (as well as well-documented) implementations.
24.5.1 Using polr() from the MASS library
# Convert to factor and determine orderingsurvey$rating <-factor(survey$rating, ordered =TRUE, levels =c("1", "2", "3", "4", "5"))# Fit polr modelsurvey.polr <-polr(rating ~ freq + routine,data = survey)# Model summarysummary(survey.polr)
Call:
polr(formula = rating ~ freq + routine, data = survey)
Coefficients:
Value Std. Error t value
freqlo -0.01095 0.1291 -0.08483
routinelo -0.55521 0.1302 -4.26449
Intercepts:
Value Std. Error t value
1|2 -0.8704 0.1228 -7.0859
2|3 0.1342 0.1188 1.1290
3|4 1.2528 0.1293 9.6856
4|5 2.8915 0.2003 14.4345
Residual Deviance: 2246.662
AIC: 2258.662
# R-squared and AICPseudoR2(survey.polr, c("Nagelkerke", "AIC"))
Coefficients: The conditions freqlo (low frequency) and routinelo (low-routine context) both have negative values, which means that both of them decrease the probability of obtaining a higher acceptability rating (compared to freqhi and routinehi).
Intercepts: These represent the cutpoints between the ordinal categories, which are necessary for calculating the probabilities of each ordinal category.
24.5.2 Testing assumptions and goodness of fit
Test proportional odds assumption:
brant(survey.polr) # p < 0.05 is a violation of the assumption
--------------------------------------------
Test for X2 df probability
--------------------------------------------
Omnibus 14.45 6 0.02
freqlo 6.5 3 0.09
routinelo 8.14 3 0.04
--------------------------------------------
H0: Parallel Regression Assumption holds
Hosmer-Lemeshow test, which is essentially a \(\chi^2\)-test:
If the data is nested according to some grouping factor with \(1, ..., k\) groups, we can let the intercept and/or slopes vary by group. For instance, recall the varying-intercept model:
\[
Y = \alpha_{k} + \beta_1X_{1} + \beta_2X_{2} + ... + \epsilon \qquad \alpha_{k} \sim N(\mu_{\alpha}, \sigma_{\alpha}^2).
\] In this case we also speak of random effects.
# Convert to factor and determine orderingsurvey$rating <-factor(survey$rating, ordered =TRUE, levels =c("1", "2", "3", "4", "5"))# Fit mixed model with random intercepts for "verb" and "ParticipantID"clm.2<- ordinal::clmm(rating ~ freq * routine + (1| verb) + (1| ParticipantID),data = survey, Hess=TRUE)# Model summarysummary(clm.2)
Cumulative Link Mixed Model fitted with the Laplace approximation
formula: rating ~ freq * routine + (1 | verb) + (1 | ParticipantID)
data: survey
link threshold nobs logLik AIC niter max.grad cond.H
logit flexible 784 -991.65 2001.30 598(2386) 3.81e-04 1.5e+02
Random effects:
Groups Name Variance Std.Dev.
ParticipantID (Intercept) 2.0628 1.4363
verb (Intercept) 0.9059 0.9518
Number of groups: ParticipantID 98, verb 8
Coefficients:
Estimate Std. Error z value Pr(>|z|)
freqlo -0.1567 0.2138 -0.733 0.463437
routinelo -0.7473 0.2103 -3.553 0.000381 ***
freqlo:routinelo -0.1406 0.2981 -0.472 0.637046
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Threshold coefficients:
Estimate Std. Error z value
1|2 -1.3562 0.4020 -3.374
2|3 0.2062 0.3988 0.517
3|4 1.8329 0.4053 4.523
4|5 3.8639 0.4401 8.780
# Extract random effectsre_verb <-ranef(clm.2)$verbre_participant <-ranef(clm.2)$ParticipantID# Create dataframes for random effectsdf_verb <-data.frame(verb =rownames(re_verb), re = re_verb[,1])df_participant <-data.frame(ParticipantID =rownames(re_participant), re = re_participant[,1])# Get predictions for an average casepred_avg <-ggpredict(clm.2, terms =c("freq", "routine"))# Add random effects to predictionspred_verb <-crossing(pred_avg, df_verb) %>%mutate(predicted = predicted + re)pred_participant <-crossing(pred_avg, df_participant) %>%mutate(predicted = predicted + re)# Create a horizontal dot plot for random effects of participantsp_caterpillar <-ggplot(df_participant, aes(x = re, y =reorder(ParticipantID, re))) +geom_point(size =3, color ="steelblue3") +# Dots representing the random effectslabs(title ="Random Effects for Participants", x ="Random Effect Estimate (log odds)", y ="Participant ID") +geom_vline(xintercept =0, linetype ="dashed", color ="grey20") +theme_minimal() +theme(panel.grid.major.y =element_blank()) # Removes the gridlines for y-axisp_caterpillar2 <-ggplot(df_verb, aes(x = re, y =reorder(verb, re))) +geom_point(size =3, color ="steelblue3") +# Dots representing the random effectslabs(title ="Random Effects for Verbs", x ="Random Effect Estimate (log odds)", y ="Verb") +geom_vline(xintercept =0, linetype ="dashed", color ="grey20") +theme_minimal() +theme(panel.grid.major.y =element_blank()) # Removes the gridlines for y-axisggarrange(p_caterpillar, p_caterpillar2, ncol =2, common.legend =TRUE, legend ="right")
A core assumption of Generalised Linear Models (GLMs) is a linear relationship between predictor(s) and response. If, however, one is interested in exploring potential non-linear trends without the risk of extreme overfitting, GAMs offer an elegant solution: Instead of relying on the linear sum of model coefficients, GAMs estimate more flexible smooth terms\(f_k\) for \(k = 1, ..., p\). For illustration, Equation 24.8 shows a linear additive model for a continuous target variable with \(p\) predictors.
\[
Y = \beta_0 + \sum\limits_{k = 1}^p f_k(X_k)
\tag{24.8}\]
24.6.3 Application in R
# Load librarieslibrary(mgcv)library(itsadug)library(gratia)# Convert predictors to factorssurvey$ParticipantID <-as.factor(survey$ParticipantID)survey$verb <-as.factor(survey$verb)# Fit GAMMgam1 <-bam(as.numeric(rating) ~# treated as numeric term freq +# linear term routine +# linear terms(ParticipantID, bs ="re") +# smooth terms(verb, bs ="re"), # smooth termdata = survey, family =ocat(R =5) # number of ordinal categories )# Model summarysummary(gam1)
# Extract the intercepts for plottingthresh <- gratia::theta(gam1) %>% tibble::as_tibble() %>%setNames(c("threshold"))# Extract predictions for "routine"routine_pred <-ggpredict(gam1, terms ="routine")# Plot predictionsroutine_pred %>%ggplot(aes(x = x, y = predicted)) +geom_point(col ="steelblue") +geom_errorbar(aes(ymin = conf.low, ymax = conf.high, width =0), col ="steelblue") +geom_hline(data = thresh, aes(yintercept = threshold), linetype ="dashed") +theme_minimal()
# Extract random effects for "verb"verb_pred <-ggpredict(gam1, terms ="verb")# Plot random effectverb_pred %>%ggplot(aes(x = x, y = predicted)) +geom_point(col ="steelblue") +geom_errorbar(aes(ymin = conf.low, ymax = conf.high, width =0), col ="steelblue") +geom_hline(data = thresh, aes(yintercept = threshold), linetype ="dashed") +theme_minimal()
# Extract random effects for "ParticipantID"subj_pred <-ggpredict(gam1, terms ="ParticipantID")# Plot random effectsubj_pred |>ggplot(aes(x = x, y = predicted)) +geom_point(col ="steelblue") +geom_errorbar(aes(ymin = conf.low, ymax = conf.high, width =0), col ="steelblue") +#geom_line() +geom_hline(data = thresh, aes(yintercept = threshold), linetype ="dashed") +theme_minimal()
Baayen, R. Harald, and Maja Linke. 2020. “Generalized AdditiveMixedModels.” In A PracticalHandbook of CorpusLinguistics, edited by Magali Paquot and Stefan Thomas Gries, 563–91. Cham: Springer.
Baguley, Thomas. 2012. Serious Stats: AGuide to AdvancedStatistics for the BehavioralSciences. Houndmills, Basingstoke: Palgrave Macmillan.
Glass, Lelia. 2021. “English Verbs Can Omit Their Objects When They Describe Routines.”English Language and Linguistics 26 (1): 49–73. https://doi.org/10.1017/S1360674321000022.
Hastie, Trevor, and Robert Tibshirani. 1991. Generalized AdditiveModels. London: Chapman & Hall.
O’Connell, Ann A. 2006. Logistic RegressionModels for OrdinalResponseVariables. Vol. 146. Thousand Oaks, Calif.: Sage.
Powers, Daniel A., and Yu Xie. 2008. Statistical Methods for CategoricalDataAnalysis. 2. ed. Bingley: Emerald.
Wood, Simon N. 2006. Generalized AdditiveModels: AnIntroduction with R. Boca Raton: Chapman & Hall/CRC.