26  Poisson regression

Author
Affiliation

Vladimir Buskin

Catholic University of Eichstätt-Ingolstadt

26.2 Preparation

# Load libraries
library(tidyverse)
library(ggeffects)
library(readxl)
library(car)

# Load datasets
verbs <- read_xlsx("winter_2020_visual.xlsx")

26.3 The Poisson family

Frequency data is ubiquitous in corpus linguistics. Given its numeric nature, it seems tempting to model such data using linear regression, but doing so is bound to cause problems. Partially owing to the fact that count data is always positive, the residuals more often than not deviate from normality and additionally display non-constant variance. The figures below illustrate these issues for a linear model fitted to simulated count data.

A probability distribution that is much better equipped for this special case of discrete, yet numeric data is the Poisson distribution. Assuming a Poisson-distributed variable \(X\), its exact shape is determined by a single parameter, \(\lambda\) (lambda), which is both its mean and its variance. In other words,

\[ X \sim Pois(\lambda). \tag{26.1}\]

Its probability mass function has a notable negative skew, which becomes less conspicuous for increasingly higher parameter values.

26.4 Poisson regression

Recall the linear regression Equation 22.10, where the dependent variable \(Y\) was modelled as a function of the linear sum of predictor terms \(\beta_pX_p\) for \(p\) independent variables.

The Poisson model is very similar to the linear regression model, with the main differences being the logarithmic transformation of the response and the exclusion of the error term:

\[ \ln(Y) = \beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_pX_p. \tag{26.2}\]

To remove the logarithm and obtain the model output on a more intuitive scale, we simply exponentiate both sides:

\[ Y = e^{\beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_pX_p}. \tag{26.3}\]

26.4.1 Application in R

The data provided by Winter (2020) contains frequency data for hundreds of verbs (Freq column) as well as a variety of psycholinguistic ratings (Sight, Touch, Sound etc.), which were originally compiled by Lynott & Connell (2009).

head(verbs)
# A tibble: 6 × 9
  Word      DominantModality Sight Touch Sound  Taste  Smell Log10Freq  Freq
  <chr>     <chr>            <dbl> <dbl> <dbl>  <dbl>  <dbl>     <dbl> <dbl>
1 abrasive  Haptic            2.89 3.68  1.68  0.579  0.579      1.36     23
2 absorbent Visual            4.14 3.14  0.714 0.476  0.476      0.903     8
3 aching    Haptic            2.05 3.67  0.667 0.0476 0.0952     2.02    105
4 acidic    Gustatory         2.19 1.14  0.476 4.19   2.90       1.04     11
5 acrid     Olfactory         1.12 0.625 0.375 3      3.5        0.301     2
6 adhesive  Haptic            3.67 4.33  1.19  0.905  1.76       1.67     47

We will examine of the effects of different sensory ratings on the frequency of a verb. When fitting the generalized linear model, it’s important to indicate the argument family = "poisson" to apply the correct (logarithmic) link function.

# Fit Poisson regression model
freq.m1 <- glm(Freq ~ Sight + Touch + Sound + Taste + Smell, data = verbs, family = "poisson")

# Summarise model statistics
summary(freq.m1)

Call:
glm(formula = Freq ~ Sight + Touch + Sound + Taste + Smell, family = "poisson", 
    data = verbs)

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  3.1201624  0.0118785  262.67   <2e-16 ***
Sight        0.8726608  0.0024550  355.46   <2e-16 ***
Touch        0.1490792  0.0009801  152.10   <2e-16 ***
Sound        0.1692741  0.0011662  145.15   <2e-16 ***
Taste        0.0790246  0.0019106   41.36   <2e-16 ***
Smell       -0.0963918  0.0019746  -48.82   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 1890910  on 361  degrees of freedom
Residual deviance: 1633858  on 356  degrees of freedom
  (61 observations deleted due to missingness)
AIC: 1636345

Number of Fisher Scoring iterations: 7

The model has identified numerous significant effects, i.e., \(\beta\)-values that are significantly different from 0. The low standard errors hint at very robust estimates, resulting in 95% confidence intervals that are barely visible in the effect plots.

# Get predicted values for each predictor
pred_sight <- ggpredict(freq.m1, terms = "Sight")
pred_touch <- ggpredict(freq.m1, terms = "Touch")
pred_sound <- ggpredict(freq.m1, terms = "Sound")
pred_taste <- ggpredict(freq.m1, terms = "Taste")
pred_smell <- ggpredict(freq.m1, terms = "Smell")

# For plotting individual predictions, simply use:
# plot(pred_sight) 

# Combine all predictions into one data frame
pred_all <- rbind(
  data.frame(pred_sight, predictor = "Sight"),
  data.frame(pred_touch, predictor = "Touch"),
  data.frame(pred_sound, predictor = "Sound"),
  data.frame(pred_taste, predictor = "Taste"),
  data.frame(pred_smell, predictor = "Smell")
)

# Create faceted plot
ggplot(pred_all, aes(x = x, y = predicted)) +
  geom_line() +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.5) +
  facet_wrap(~predictor, scales = "free_x") +
  labs(x = "Rating", y = "Predicted verb frequency") +
  theme_bw()

26.4.2 Interpretation

As we can see, verbs with higher Sight ratings (i.e., highly visual verbs) are associated with the greatest increase in frequency of occurrence. Given a one-unit increase in \(X\), we can obtain the percentage increase (or decrease, respectively) using the formula \(100(e^b - 1)\). The predictor Sight has an estimate of approximately \(0.87\) (\(p< 0.001\), 95% CI [0.87, 0.88]), so the proportional increase is \(100(e^{0.87} - 1) = 138.69\%\).

By contrast, Smell is a associated with a lower frequency: \(100(e^{0.10} -1) = -9.51\%\) is the drop in frequency for each one-unit increase in smell ratings. In essence: Smelly verbs are unpopular!