Statistics and Data Analysis for Corpus Linguists: From Theory to Practice with R

Authors
Affiliations

Catholic University of Eichstätt-Ingolstadt

Catholic University of Eichstätt-Ingolstadt

University of Vienna

Preface

This collection of handouts provides a hands-on introduction to data analysis and statistical methods in quantitative corpus linguistics with R. It is designed with accessibility in mind, assuming no prior knowledge of programming or statistics. All you need to get started is a laptop; everything else will be explained within these pages.

Primarily, this reader is geared towards students attending the classes Language Variation (BA) and Statistics for Linguistics (MA) at the Catholic University of Eichstätt-Ingolstadt (Germany). However, it is also meant to equip students currently working on their BA/MA/PhD theses with the tools they need to conduct empirical studies on a wide array of linguistic phenomena. The methods presented here reflect the state-of-the-art in corpus-linguistic research, providing readers with current and relevant analytical techniques.

Overview

We begin by establishing the general structure of corpus-linguistic studies, accompanied by key theoretical and practical considerations. The second section introduces R as an analytical tool, covering its core functionality. With these fundamentals in place, we proceed to query corpora directly within R. Chapters four and five focus on describing discrete and continuous outputs, as well as identifying meaningful associations and differences in the data. Finally, to gain more nuanced insights into potential patterns, we apply common machine learning algorithms to fit, evaluate, and interpret statistical models.

Throughout this journey, readers will develop the skills to conduct robust corpus-linguistic analyses, from basic querying to advanced statistical modeling.

Collaborators

Vladimir Buskin, PhD student at the Department of English Language and Linguistics, Catholic University of Eichstätt-Ingolstadt

Thomas Brunner, Assistant Professor at the Department of English Language and Linguistics, Catholic University of Eichstätt-Ingolstadt

Philippa Adolf, PhD student at the Department of Romance Studies, University of Vienna