# Load libraries
library(tidyverse)
library(ggeffects)
library(readxl)
library(car)
# Load datasets
<- read_xlsx("winter_2020_visual.xlsx") verbs
26 Poisson regression
26.1 Recommended reading
Winter (2020): Chapter 13
Baguley (2012): Chapter 17.5
26.2 Preparation
26.3 The Poisson family
Frequency data is ubiquitous in corpus linguistics. Given its numeric nature, it seems tempting to model such data using linear regression, but doing so is bound to cause problems. Partially owing to the fact that count data is always positive, the residuals more often than not deviate from normality and additionally display non-constant variance. The figures below illustrate these issues for a linear model fitted to simulated count data.
A probability distribution that is much better equipped for this special case of discrete, yet numeric data is the Poisson distribution. Assuming a Poisson-distributed variable \(X\), its exact shape is determined by a single parameter, \(\lambda\) (lambda), which is both its mean and its variance. In other words,
\[ X \sim Pois(\lambda). \tag{26.1}\]
Its probability mass function has a notable negative skew, which becomes less conspicuous for increasingly higher parameter values.
26.4 Poisson regression
Recall the linear regression Equation 22.10, where the dependent variable \(Y\) was modelled as a function of the linear sum of predictor terms \(\beta_pX_p\) for \(p\) independent variables.
The Poisson model is very similar to the linear regression model, with the main differences being the logarithmic transformation of the response and the exclusion of the error term:
\[ \ln(Y) = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_pX_p. \tag{26.2}\]
To remove the logarithm and obtain the model output on a more intuitive scale, we simply exponentiate both sides:
\[ Y = e^{\beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_pX_p}. \tag{26.3}\]
26.4.1 Application in R
The data provided by Winter (2020) contains frequency data for hundreds of verbs (Freq
column) as well as a variety of psycholinguistic ratings (Sight
, Touch
, Sound
etc.), which were originally compiled by Lynott & Connell (2009).
head(verbs)
# A tibble: 6 × 9
Word DominantModality Sight Touch Sound Taste Smell Log10Freq Freq
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 abrasive Haptic 2.89 3.68 1.68 0.579 0.579 1.36 23
2 absorbent Visual 4.14 3.14 0.714 0.476 0.476 0.903 8
3 aching Haptic 2.05 3.67 0.667 0.0476 0.0952 2.02 105
4 acidic Gustatory 2.19 1.14 0.476 4.19 2.90 1.04 11
5 acrid Olfactory 1.12 0.625 0.375 3 3.5 0.301 2
6 adhesive Haptic 3.67 4.33 1.19 0.905 1.76 1.67 47
We will examine of the effects of different sensory ratings on the frequency of a verb. When fitting the generalized linear model, it’s important to indicate the argument family = "poisson"
to apply the correct (logarithmic) link function.
# Fit Poisson regression model
<- glm(Freq ~ Sight + Touch + Sound + Taste + Smell, data = verbs, family = "poisson")
freq.m1
# Summarise model statistics
summary(freq.m1)
Call:
glm(formula = Freq ~ Sight + Touch + Sound + Taste + Smell, family = "poisson",
data = verbs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.1201624 0.0118785 262.67 <2e-16 ***
Sight 0.8726608 0.0024550 355.46 <2e-16 ***
Touch 0.1490792 0.0009801 152.10 <2e-16 ***
Sound 0.1692741 0.0011662 145.15 <2e-16 ***
Taste 0.0790246 0.0019106 41.36 <2e-16 ***
Smell -0.0963918 0.0019746 -48.82 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 1890910 on 361 degrees of freedom
Residual deviance: 1633858 on 356 degrees of freedom
(61 observations deleted due to missingness)
AIC: 1636345
Number of Fisher Scoring iterations: 7
The model has identified numerous significant effects, i.e., \(\beta\)-values that are significantly different from 0. The low standard errors hint at very robust estimates, resulting in 95% confidence intervals that are barely visible in the effect plots.
# Get predicted values for each predictor
<- ggpredict(freq.m1, terms = "Sight")
pred_sight <- ggpredict(freq.m1, terms = "Touch")
pred_touch <- ggpredict(freq.m1, terms = "Sound")
pred_sound <- ggpredict(freq.m1, terms = "Taste")
pred_taste <- ggpredict(freq.m1, terms = "Smell")
pred_smell
# For plotting individual predictions, simply use:
# plot(pred_sight)
# Combine all predictions into one data frame
<- rbind(
pred_all data.frame(pred_sight, predictor = "Sight"),
data.frame(pred_touch, predictor = "Touch"),
data.frame(pred_sound, predictor = "Sound"),
data.frame(pred_taste, predictor = "Taste"),
data.frame(pred_smell, predictor = "Smell")
)
# Create faceted plot
ggplot(pred_all, aes(x = x, y = predicted)) +
geom_line() +
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.5) +
facet_wrap(~predictor, scales = "free_x") +
labs(x = "Rating", y = "Predicted verb frequency") +
theme_bw()
26.4.2 Interpretation
As we can see, verbs with higher Sight
ratings (i.e., highly visual verbs) are associated with the greatest increase in frequency of occurrence. Given a one-unit increase in \(X\), we can obtain the percentage increase (or decrease, respectively) using the formula \(100(e^b - 1)\). The predictor Sight
has an estimate of approximately \(0.87\) (\(p< 0.001\), 95% CI [0.87, 0.88]), so the proportional increase is \(100(e^{0.87} - 1) = 138.69\%\).
By contrast, Smell
is a associated with a lower frequency: \(100(e^{0.10} -1) = -9.51\%\) is the drop in frequency for each one-unit increase in smell ratings. In essence: Smelly verbs are unpopular!