28 Exploratory Factor Analysis

Author

Affiliation

Vladimir Buskin

Catholic University of Eichstätt-Ingolstadt

28.1 Recommended reading

For linguists:

Levshina (2015: Chapter 18)

General:

Mair (2018: Chapter 2)

28.2 Preparation

The implementation of Exploratory Factor Analysis in R is very similar to that of Principle Components Analysis. To highlight these similarities, we will use the same libraries (most importantly psych) and the same dataset scope_sem_sub as in the unit on PCA (see Section 27.2 for further details).

# Load libraries
library(tidyverse)
library(purrr)
library(psych)
library(GPArotation)
library(gridExtra)

# Load data
scope_sem_df <- readRDS("scope_sem.RDS")

# Select subset
scope_sem_sub <- scope_sem_df[,1:11]

# Overview
glimpse(scope_sem_sub)

Rows: 1,702
Columns: 11
$ Verb               <chr> "abstain", "abstract", "abuse", "accelerate", "acce…
$ Resnik_strength    <dbl> 0.40909889, 0.18206692, 0.12473608, -0.76972217, -1…
$ Conc_Brys          <dbl> -0.94444378, -1.92983639, -0.59478833, 0.22107437, …
$ Nsenses_WordNet    <dbl> -0.68843996, 0.27755219, 0.00155443, -0.68843996, 0…
$ Nmeanings_Websters <dbl> -0.95559835, 0.73781281, 0.73781281, -0.27823388, 0…
$ Visual_Lanc        <dbl> -2.2545455, 0.6103733, 1.3354358, -0.4342084, -0.34…
$ Auditory_Lanc      <dbl> -0.84225787, -0.35605108, 1.54797548, 0.18795651, 1…
$ Haptic_Lanc        <dbl> -0.75523987, -0.29089287, 1.25099360, -0.18911818, …
$ Olfactory_Lanc     <dbl> -0.14444936, -0.37350419, -0.53335522, -0.37350419,…
$ Gustatory_Lanc     <dbl> 0.27698988, -0.10105698, -0.36148925, -0.52110903, …
$ Interoceptive_Lanc <dbl> 1.08153427, -0.06560311, 1.64313895, 1.45452985, 0.…

28.3 Exploratory Factor Analysis vs. PCA

Exploratory Factor Analysis (EFA) is quite similar to PCA in that it compresses the high-dimensional feature space, yet the core idea is not to capture as much variance as possible with as few variables as possible, but rather reveal latent (= invisible) variables, i.e., factors.

The computation bears some resemblance to that of PCA, with the main difference being that an observation \(x_m\) is assumed to be generated by combinations of factor loadings \(\lambda_{1}, \lambda_{2}, \dots, \lambda_{mp}\) with the underlying factors \(\xi_{1}, \xi_{2}, \dots, \xi_{p}\) (see Equation 28.1). Everything to the right of the equation can only be obtained by running estimation procedures such as Principle Axis Factoring or Maximum Likelihood Estimation.

\[ x_1 = \lambda_{11}\xi_{11} + \lambda_{12}\xi_{12} + \dots + \lambda_{1p}\xi_{p} + \epsilon_1 \tag{28.1}\]

When retrieving PCA and EFA loadings, several interpretive differences must be kept in mind:

Key differences between EFA and PCA

PCA: PCA weights can be conceptualised as “directions in feature space along which the data vary the most” (James et al. 2021: 503) and are analogous to regression slopes. Features with similar loadings on a given PC will be very close to each other in a biplot and could be understood as correlated with each other.
EFA: The factor loadings in an EFA, on the other hand, directly indicate how strong a factor is correlated with an existing independent variable in the dataset. As such, they help identify and interpret the underlying constructs that have given rise to the data. We can think of EFA loadings as regression coefficients and correlation coefficients at the same time.

28.4 Application in R

We use our insights from the PCA analysis, according to which three latent variables are enough to capture the bulk of variance in the dataset. When fitting an EFA model, principle axis factoring is the default solution, but could also be changed to fm = "ml" to perform Maximum Likelihood Estimation.

efa1 <- fa(scope_sem_sub[,-1], nfactors = 3, rotate = "none", fm = "pa")

The remaining printing and plotting methods are identical to PCA.

Print loadings:

loadings(efa1)


Loadings:
                   PA1    PA2    PA3   
Resnik_strength            0.419 -0.269
Conc_Brys           0.800  0.228 -0.300
Nsenses_WordNet     0.560 -0.628  0.236
Nmeanings_Websters  0.495 -0.555  0.233
Visual_Lanc         0.576  0.141 -0.255
Auditory_Lanc      -0.270 -0.132  0.153
Haptic_Lanc         0.608  0.123       
Olfactory_Lanc      0.291  0.482  0.441
Gustatory_Lanc      0.263  0.513  0.657
Interoceptive_Lanc -0.245         0.386

                 PA1   PA2   PA3
SS loadings    2.196 1.482 1.143
Proportion Var 0.220 0.148 0.114
Cumulative Var 0.220 0.368 0.482

Plot loadings:

plot(efa1, labels = colnames(scope_sem_sub[,-1]), main = NA)

Plot PA scores and loadings:

biplot(efa1, choose = c(1, 2), main = NA,
       pch = 20, col = c("darkgrey", "blue"))

biplot(efa1, choose = c(2, 3), main = NA,
       pch = 20, col = c("darkgrey", "blue"))

28.4.1 Rotation

Factors are typically rotated in order to aid in their interpretation, resulting in much clearer loading patterns. Varimax rotation is the default technique and does not affect the model fit (i.e., there is no loss in explained variance; for details see (Mair 2018: 26-29).¹

¹ Varimax is a so-called orthogonal rotation technique and, therefore, does not introduce correlations between the factors. If correlated factors are explicitly desired, oblique rotations such as oblimin and promax provide apt alternatives (Mair 2018: 27).

efa2 <- fa(scope_sem_sub[,-1], nfactors = 3, rotate = "Varimax", fm = "pa")

loadings(efa2)


Loadings:
                   PA1    PA2    PA3   
Resnik_strength     0.188 -0.470       
Conc_Brys           0.868  0.130  0.107
Nsenses_WordNet     0.145  0.861       
Nmeanings_Websters  0.115  0.771       
Visual_Lanc         0.638              
Auditory_Lanc      -0.336              
Haptic_Lanc         0.572  0.192  0.165
Olfactory_Lanc      0.164         0.694
Gustatory_Lanc                    0.873
Interoceptive_Lanc -0.410         0.199

                 PA1   PA2   PA3
SS loadings    1.868 1.627 1.326
Proportion Var 0.187 0.163 0.133
Cumulative Var 0.187 0.350 0.482

The rotated EFA object paints a picture that is very similar to the PCA result from the previous unit.

diagram(efa2, main = NA)

biplot(efa2, choose = c(1, 2), main = NA,
       pch = 20, col = c("darkgrey", "blue"))

biplot(efa2, choose = c(2, 3), main = NA,
       pch = 20, col = c("darkgrey", "blue"))

Interpreting the EFA output

Perception: The first principle axis is once more loaded heavily (and positively) by increasing concreteness scores in addition to higher visual and haptic ratings. Moreover, they display strong linear relationships. The negative association with interoceptive ratings suggests that referents that tend be perceived directly with their senses (concreteness) do not tend to be perceived inside their body.
Senses: In PA2 we find the inverse pattern of PC2 – very strong positive correlations with sense-related features and a weaker, yet notable negative correlation with selectional preference strength. If a verb has more senses, it tends to carry less information about its context.
Ingestion: Interoceptive ratings are no longer part of the picture, thus giving way to the gustatory and olfactory perception of referents.