Statistics for Corpus Linguists
  • Overview
  • Fundamentals
    • 1.1 Basics
    • 1.2 Linguistic variables
    • 1.3 Research questions
    • 1.4 Set theory and mathematical notation
  • Introduction to R
    • 2.1 First steps
    • 2.2 Exploring R Studio
    • 2.3 Vectors
    • 2.4 Data frames
    • 2.5 Libraries
    • 2.6 Importing/Exporting
  • NLP
    • 3.1 Concordancing
    • 3.2 Regular expressions
    • 3.3 The CQP interface
    • 3.4 Data annotation
  • Statistics
    • 4.1 Data, variables, samples
    • 4.2 Probability theory
    • 4.3 Descriptive statistics
    • 4.4 Hypothesis testing
    • 4.5 Chi-squared test
    • 4.6 t-test
  • Models
    • 6.1 Linear regression
    • 6.2 Logistic regression
    • 6.3 Mixed-effects regression
    • 6.4 Poisson regression
    • 6.5 Ordinal regression
  • Machine Learning
    • 7.1 Tree-based methods
    • 7.2 Gradient boosting
    • 7.3 PCA
    • 7.4 EFA
    • 7.5 Clustering
  1. 2. Introduction to R
  2. 2.1 First steps
  • 2. Introduction to R
    • 2.1 First steps
    • 2.2 Exploring R Studio
    • 2.3 Vectors
    • 2.4 Data frames
    • 2.5 Libraries
    • 2.6 Importing/Exporting

On this page

  • 1 Why learn R to begin with?
  • 2 Installing R
  • 3 Installing RStudio
  1. 2. Introduction to R
  2. 2.1 First steps

2.1 First steps

Author
Affiliation

Vladimir Buskin

Catholic University of Eichstätt-Ingolstadt

1 Why learn R to begin with?

When it comes to data analysis, learning R offers an overwhelming number of short- and long-term advantages over conventional spreadsheet software such as Microsoft Excel or LibreOffice Calc:

  • First of all, it’s completely free. There’s no need to obtain any expensive licenses, as it is the case for commercial software such as SPSS or MS Excel.

  • R makes it very easy to document and share every step of the analysis, thereby facilitating reproducible workflows.

  • Large (and by that I mean extremely large!) datasets pose no problems whatsoever. Loading tabular data with hundreds of thousands (or even millions) of rows only takes a few seconds, whereas most other software would crash.

  • There are numerous extensions that provide tailored functions for corpus linguistics that aren’t available in general-purpose spreadsheet software. This allows us to work with corpora, use complex search expression, perform part-of-speech annotation, dependency parsing, and much more – all from within R.

  • R’s ggplot2 offers an incredibly powerful framework for data visualisation. Don’t believe it? Check out the ggplot2 gallery.

  • The CRAN repository features more than 20,000 packages that can be installed to expand the functionality of R almost indefinitely. Should none of them meet your needs, R gives you the tools to comfortably write and share your own functions and packages.

2 Installing R

The first step involves downloading the R programming language itself. The link will take you to the homepage of the Comprehensive R Archive Network (CRAN) where you can download the binary distribution. Choose the one that corresponds to your operating system (Windows/MAC/Linux).

Installation instructions for Windows users

Click “Download R for Windows” \(\rightarrow\) Select “base” \(\rightarrow\) Click on “Download R-4.4.1 for Windows” (or whatever most recent version is currently displayed).

Open the set-up file you’ve just downloaded and simply follow the instructions on screen. It’s fine to go with the default options.

Video tutorial on YouTube

Installation instructions for MacOS users

Click “Download R for macOS” \(\rightarrow\) Select the latest release for your OS

Open the downloaded .pkg file and follow the instructions in the installation window.

Video tutorial on YouTube

3 Installing RStudio

You can now download and install RStudio. RStudio is a so-called “Integrated Development Environment” (IDE), which will provide us with a variety of helpful tools to write and edit code comfortably. If R was a musical instrument, then RStudio would be the recording studio, so-to-speak.

2.2 Exploring R Studio