Statistics and Data Analysis for Corpus Linguists: From Theory to Practice with R

17.1 Suggested reading

Baguley (2012): Chapter 2

Agresti and Kateri (2022): Chapter 2

Heumann, Schomaker, and Shalabh (2022): Chapter 8

17.2 Continuous distributions

17.2.1 The normal distribution

A great number of numerical variables in the world follow the well-known normal (or Gaussian) distribution, which includes test scores, weight and height, among many others. The plot below illustrates its characteristic bell-shape: Most observations are in the middle, with considerably fewer near the fringes. For example, most people are rather “average” in height; there are only few people that are extremely short or extremely tall.

The normal distribution is typically described in terms of two parameters: The population mean \(\mu\) and the population standard deviation \(\sigma\). If a random variable \(X\) is normally distributed, we typically use the notation in Equation 17.1.

\[ X \sim N(\mu, \sigma^2). \tag{17.1}\]

These two parameters affect the shape of the probability density function (PDF) \(f(x)\), which is formally defined as

\[ f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x - \mu)^2}{2 \sigma^2}}. \tag{17.2}\]

In practice, this function returns a bell curve:

Quick facts about the Gaussian bell curve

Quite interestingly,

68% all values fall within one standard deviation of the mean,
95% within two, and
99.7% within three.

In the plot, the \(y\)-axis indicates the density of population values; note that since the Gaussian distribution is a continuous distribution with technically infinite \(x\)-values, the probability of any given value must be 0. We can only obtain probabilities for intervals of values, which are given by

\[ P(a \leq X \leq b) = \int_a^b f(x)dx. \tag{17.3}\]

We can find the population mean of \(X\) with a PDF \(f(x)\) via

\[ E(X) = \mu = \int_x xf(x)dx, \tag{17.4}\]

where \(E(X)\) denotes the expected value of \(X\), i.e., the mean. Essentially, multiplying every value \(x\) by its respective probability density \(f(x)\) and integrating over all possible values of \(x\) will return \(E(X) = \mu\).

17.3 Discrete distributions

17.3.1 Bernoulli distribution

The Bernoulli distribution is a discrete probability distribution for random variables which have only two possible outcomes: “positive” (often coded as 1) and “negative” (often coded as 0). Examples of such variables include coin tosses (heads/tails), binary response questions (yes/no), and defect status (defective/non-defective).

If a random variable \(X\) follows a Bernoulli distribution, it is determined by the parameter \(p\), which is the probability of the positive case:

\[ X \sim Bernoulli(p).\] The probability mass function (PMF) of the Bernoulli distribution is given by: \[ P(X = x) = \begin{cases} p & \text{if } x = 1 \\ 1 - p & \text{if } x = 0 \end{cases} \]

where \(0 \leq p \leq 1\). This function shows the probability of \(X\) taking on the value of 1 or 0 (cf. Heumann, Schomaker, and Shalabh 2022: 162-163).

17.3.2 Binomial Distribution

The binomial distribution is a fairly straightforward extension of the Bernoulli distribution in that it models the number of successes in \(n\) independent Bernoulli trials, each with probability \(p\) of success. If a random variable \(X\) follows a binomial distribution with parameters \(n\) and \(p\), we write:

\[ X \sim Binomial(n,p) \tag{17.5}\]

The probability mass function (PMF) for the binomial distribution is:

\[ P(X = k) = \binom{n}{k} p^k (1-p)^{n-k} \tag{17.6}\]

where:

\(n\) is the number of trials
\(k\) is the number of successes \((0 \leq k \leq n)\)
\(p\) is the probability of success on each trial \((0 \leq p \leq 1)\)
\(\binom{n}{k}\) is the binomial coefficient (“n choose k”)