Statistics and Data Analysis for Corpus Linguists: From Theory to Practice with R
Preface
This collection of handouts provides a hands-on introduction to data analysis and statistical methods in quantitative corpus linguistics with R. It is designed with accessibility in mind, assuming no prior knowledge of programming or statistics. All you need to get started is a laptop; everything else will be explained within these pages.
Primarily, this reader is geared towards students attending the classes Language Variation (BA) and Statistics for Linguistics (MA) at the Catholic University of Eichstätt-Ingolstadt (Germany). However, it is also meant to equip students currently working on their BA/MA/PhD theses with the tools they need to conduct empirical studies on a wide array of linguistic phenomena. The methods presented here reflect the state-of-the-art in corpus-linguistic research, providing readers with current and relevant analytical techniques.
Overview
We begin by establishing the general structure of corpus-linguistic studies, accompanied by key theoretical and practical considerations. The second section introduces R as an analytical tool, covering its core functionality. With these fundamentals in place, we proceed to query corpora directly within R. Chapters four and five focus on describing discrete and continuous outputs, as well as identifying meaningful associations and differences in the data. Finally, to gain more nuanced insights into potential patterns, we apply common machine learning algorithms to fit, evaluate, and interpret statistical models.
Throughout this journey, readers will develop the skills to conduct robust corpus-linguistic analyses, from basic querying to advanced statistical modeling.
Collaborators
Vladimir Buskin, PhD student at the Department of English Language and Linguistics, Catholic University of Eichstätt-Ingolstadt
Thomas Brunner, Assistant Professor at the Department of English Language and Linguistics, Catholic University of Eichstätt-Ingolstadt
Philippa Adolf, PhD student at the Department of Romance Studies, University of Vienna