library("readxl")
library("tidyverse")
<- read.csv("Vowels_Apache.csv", sep = "\t") data_vowels
20 t-test
20.1 Preparation
- Load packages and data:
20.2 The \(t\)-test
Since the \(\chi^2\) measure exclusively works with categorical variables, a separate test statistic is required if one of them is a continuous variable. The \(t\) statistic is often used for research questions involving differences between sample means. The way \(t\) is calculated depends on the sources of \(X\) and \(Y\): Do they originate from the same sample or from two (in-)dependent ones?
First, we consider two independent samples from a population:
Sample \(X\) with the observations \(\{x_1, x_2, ..., {x_n}_1\}\), sample size \(n_1\), sample mean \(\bar{x}\) and sample variance \(s^2_x\).
Sample \(Y\) with the observations \(\{y_1, y_2, ..., {y_n}_2\}\), sample size \(n_2\), sample mean \(\bar{y}\) and sample variance \(s^2_y\).
The \(t\)-statistic after Welch is given by:
\[ t(x, y) = \frac{|\bar{x} - \bar{y}|}{\sqrt{\frac{s^2_x}{n_1} + \frac{s^2_y}{n_2}}} \tag{20.1}\]
If there is more than one observation for a given subject (e.g, before and after an experiment), the samples are called dependent or paired. The paired \(t\)-test assumes two continuous variables \(X\) and \(Y\).
In the paired test, the variable \(d\) denotes the difference between them, i.e., \(x - y\). The corresponding test statistic is obtained via
\[ t(x, y) = t(d) = \frac{\bar{d}}{s_d} \sqrt{n}. \tag{20.2}\]
Note the difference \(\bar{d} = \frac{1}{n}\sum_{i=1}^n{d_i}\) and the variance
\[ s^2_d = \frac{\sum_{i=1}^n({d_i} - \bar{d})^2}{n-1}. \tag{20.3}\]
Traditionally, the \(t\)-test is based on the assumptions of …
- Normality and
- Variance homogeneity (i.e., equal sample variances). Note that this does not apply to the \(t\)-test after Welch, which can handle unequal variances.
The implementation in R is very straightforward:
t.test(data_vowels$HZ_F1 ~ data_vowels$SEX, paired = FALSE) # there is a significant difference!
Welch Two Sample t-test
data: data_vowels$HZ_F1 by data_vowels$SEX
t = 2.4416, df = 112.19, p-value = 0.01619
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
8.403651 80.758016
sample estimates:
mean in group F mean in group M
528.8548 484.2740
If at least one assumption of the \(t\)-test has been violated, it is advisable to use a non-parametric test such as the Wilcoxon-Mann-Whitney (WMW) U-Test instead. In essence, this test compares the probabilities of encountering a value \(x\) from sample \(X\) that is greater than a value \(y\) from sample \(Y\). For details, see ?wilcox.test()
.
20.3 Workflow in R
20.3.1 Define hypotheses
\(H_0:\) mean
F1 frequency
of men \(=\) meanF1 frequency
of women.\(H_1:\) mean
F1 frequency
of men \(\ne\) meanF1 frequency
of women.
20.3.2 Descriptive overview
We select the variables of interest and proceed calculate the mean F1 frequencies
for each level of SEX
, requiring a grouped data frame.
Code
# Filter data so as to show only those observations that are relevant
%>%
data_vowels # Filter columns
select(HZ_F1, SEX) %>%
# Define grouping variable
group_by(SEX) %>%
# Compute mean and standard deviation for each sex
summarise(mean = mean(HZ_F1),
sd = sd(HZ_F1)) -> data_vowels_stats
::kable(data_vowels_stats) knitr
SEX | mean | sd |
---|---|---|
F | 528.8548 | 110.80099 |
M | 484.2740 | 87.90112 |
Code
# Plot distributions
%>%
data_vowels_stats ggplot(aes(x = SEX, y = mean)) +
geom_col() +
geom_errorbar(aes(x = SEX,
ymin = mean-sd,
ymax = mean+sd), width = .2) +
theme_classic()
Code
# Plot quartiles
%>%
data_vowels ggplot(aes(x = SEX, y = HZ_F1)) +
geom_boxplot() +
theme_classic()
20.3.3 Check \(t\)-test assumptions
# Normality
shapiro.test(data_vowels$HZ_F1) # H0: data points follow the normal distribution
Shapiro-Wilk normality test
data: data_vowels$HZ_F1
W = 0.98996, p-value = 0.5311
# Check histogram
ggplot(data_vowels, aes(x = HZ_F1)) +
geom_histogram(bins = 30) +
theme_classic()
# Variance homogeneity
var.test(data_vowels$HZ_F1 ~ data_vowels$SEX) # H0: variances are not too different from each other
F test to compare two variances
data: data_vowels$HZ_F1 by data_vowels$SEX
F = 1.5889, num df = 59, denom df = 59, p-value = 0.07789
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.949093 2.660040
sample estimates:
ratio of variances
1.588907
20.3.4 Running the test
# t-test for two independent samples
t.test(data_vowels$HZ_F1 ~ data_vowels$SEX, paired = FALSE) # there is a significant difference!
Welch Two Sample t-test
data: data_vowels$HZ_F1 by data_vowels$SEX
t = 2.4416, df = 112.19, p-value = 0.01619
alternative hypothesis: true difference in means between group F and group M is not equal to 0
95 percent confidence interval:
8.403651 80.758016
sample estimates:
mean in group F mean in group M
528.8548 484.2740
20.3.5 Effect size
Cohen’s d is a possible effect size measure for continuous data and is obtained by dividing the difference of both sample means by the pooled standard deviation:
\[\frac{\bar{x} - \bar{y}}{\sqrt{\frac{{(n_1 - 1)s_x^2 + (n_2 - 1)s_y^2}}{{n_1 + n_2 - 2}}}}.\]
cohen.d(data_vowels$HZ_F1, data_vowels$SEX) # see also ?cohen.d for more details
Cohen's d
d estimate: 0.4457697 (small)
95 percent confidence interval:
lower upper
0.07976048 0.81177897
20.3.6 Reporting the results
According to a two-sample \(t\)-test, there is a significant difference between the mean F1 frequencies
of male and female speakers of Apache (\(t = 2.44\), \(df = 112.19\), \(p < 0.05\)). Therefore, \(H_0\) will be rejected.