21 t-test

Author

Affiliation

Vladimir Buskin

Catholic University of Eichstätt-Ingolstadt

21.1 Preparation

Load packages and data:

library("readxl")
library("tidyverse")

data_vowels <- read.csv("Vowels_Apache.csv", sep = "\t")

21.2 The \(t\)-test

Since the \(\chi^2\) measure exclusively works with categorical variables, a separate test statistic is required if one of them is a continuous variable. The \(t\) statistic is often used for research questions involving differences between sample means. The way \(t\) is calculated depends on the sources of \(X\) and \(Y\): Do they originate from the same sample or from two (in-)dependent ones?

First, we consider two independent samples from a population:

Sample \(X\) with the observations \(\{x_1, x_2, ..., {x_n}_1\}\), sample size \(n_1\), sample mean \(\bar{x}\) and sample variance \(s^2_x\).
Sample \(Y\) with the observations \(\{y_1, y_2, ..., {y_n}_2\}\), sample size \(n_2\), sample mean \(\bar{y}\) and sample variance \(s^2_y\).

Definition of the \(t\)-test

The \(t\)-statistic after Welch is given by:

\[ t(x, y) = \frac{|\bar{x} - \bar{y}|}{\sqrt{\frac{s^2_x}{n_1} + \frac{s^2_y}{n_2}}} \tag{21.1}\]

If there is more than one observation for a given subject (e.g, before and after an experiment), the samples are called dependent or paired. The paired \(t\)-test assumes two continuous variables \(X\) and \(Y\).
In the paired test, the variable \(d\) denotes the difference between them, i.e., \(x - y\). The corresponding test statistic is obtained via

\[ t(x, y) = t(d) = \frac{\bar{d}}{s_d} \sqrt{n}. \tag{21.2}\]

Note the difference \(\bar{d} = \frac{1}{n}\sum_{i=1}^n{d_i}\) and the variance

\[ s^2_d = \frac{\sum_{i=1}^n({d_i} - \bar{d})^2}{n-1}. \tag{21.3}\]

Traditionally, the \(t\)-test is based on the assumptions of …

Normality and
Variance homogeneity (i.e., equal sample variances). Note that this does not apply to the \(t\)-test after Welch, which can handle unequal variances.

The implementation in R is very straightforward:

t.test(data_vowels$HZ_F1 ~ data_vowels$SEX, paired = FALSE) # there is a significant difference!


    Welch Two Sample t-test

data:  data_vowels$HZ_F1 by data_vowels$SEX
t = 2.4416, df = 112.19, p-value = 0.01619
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
  8.403651 80.758016
sample estimates:
mean in group F mean in group M 
       528.8548        484.2740

Important

If at least one assumption of the \(t\)-test has been violated, it is advisable to use a non-parametric test such as the Wilcoxon-Mann-Whitney (WMW) U-Test instead. In essence, this test compares the probabilities of encountering a value \(x\) from sample \(X\) that is greater than a value \(y\) from sample \(Y\). For details, see ?wilcox.test().

wilcox.test(data_vowels$HZ_F1 ~ data_vowels$SEX)


    Wilcoxon rank sum test with continuity correction

data:  data_vowels$HZ_F1 by data_vowels$SEX
W = 2270, p-value = 0.01373
alternative hypothesis: true location shift is not equal to 0

21.3 Workflow in R

21.3.1 Define hypotheses

\(H_0:\) mean F1 frequency of men \(=\) mean F1 frequency of women.
\(H_1:\) mean F1 frequency of men \(\ne\) mean F1 frequency of women.

21.3.2 Descriptive overview

We select the variables of interest and proceed calculate the mean F1 frequencies for each level of SEX, requiring a grouped data frame.

Code

# Filter data so as to show only those observations that are relevant
data_vowels %>% 
  # Filter columns
  select(HZ_F1, SEX) %>%
    # Define grouping variable
    group_by(SEX) %>% 
      # Compute mean and standard deviation for each sex
      summarise(mean = mean(HZ_F1),
                sd = sd(HZ_F1)) -> data_vowels_stats

knitr::kable(data_vowels_stats)

SEX	mean	sd
F	528.8548	110.80099
M	484.2740	87.90112

Code

# Plot distributions
data_vowels_stats %>% 
  ggplot(aes(x = SEX, y = mean)) +
    geom_col() +
    geom_errorbar(aes(x = SEX,
                    ymin = mean-sd,
                    ymax = mean+sd), width = .2) +
    theme_classic()

Code

# Plot quartiles
data_vowels %>% 
  ggplot(aes(x = SEX, y = HZ_F1)) +
    geom_boxplot() +
    theme_classic()

21.3.3 Check \(t\)-test assumptions

# Normality
shapiro.test(data_vowels$HZ_F1) # H0: data points follow the normal distribution; however, this test is pretty unreliable!


    Shapiro-Wilk normality test

data:  data_vowels$HZ_F1
W = 0.98996, p-value = 0.5311

# Check histogram
ggplot(data_vowels, aes(x = HZ_F1)) +
  geom_histogram(bins = 30) +
  theme_classic()

# Variance homogeneity
var.test(data_vowels$HZ_F1 ~ data_vowels$SEX) # H0: variances are not too different from each other


    F test to compare two variances

data:  data_vowels$HZ_F1 by data_vowels$SEX
F = 1.5889, num df = 59, denom df = 59, p-value = 0.07789
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.949093 2.660040
sample estimates:
ratio of variances 
          1.588907

21.3.4 Running the test

# t-test for two independent samples 
t.test(data_vowels$HZ_F1 ~ data_vowels$SEX, paired = FALSE) # there is a significant difference between sample means!


    Welch Two Sample t-test

data:  data_vowels$HZ_F1 by data_vowels$SEX
t = 2.4416, df = 112.19, p-value = 0.01619
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
  8.403651 80.758016
sample estimates:
mean in group F mean in group M 
       528.8548        484.2740

21.3.5 Effect size

Cohen’s d is a possible effect size measure for continuous data and is obtained by dividing the difference of both sample means by the pooled standard deviation:

\[\frac{\bar{x} - \bar{y}}{\sqrt{\frac{{(n_1 - 1)s_x^2 + (n_2 - 1)s_y^2}}{{n_1 + n_2 - 2}}}}.\]

cohen.d(data_vowels$HZ_F1, data_vowels$SEX) # see also ?cohen.d for more details


Cohen's d

d estimate: 0.4457697 (small)
95 percent confidence interval:
     lower      upper 
0.07976048 0.81177897

21.3.6 Reporting the results

According to a two-sample \(t\)-test, there is a significant difference between the mean F1 frequencies of male and female speakers of Apache (\(t = 2.44\), \(df = 112.19\), \(p < 0.05\)). Therefore, \(H_0\) will be rejected.