library(readxl)
library(writexl)
10 Import/export data
You can find the full R script associated with this unit here.
10.1 Recommended reading
Winter (2020): Chapter 1.11
10.2 Preparation
The first section of an R script should always specify the libraries that are needed for executing the code to follow. In this unit, we will need readxl
and writexl
to aid us with importing MS Excel files.
If you haven’t installed them yet, the R console will throw an error message. For instructions on how to install an R package, consult the unit on Libraries.
10.2.1 Exporting data
Assume we’d like to export our data frame with word frequencies to a local file on our system. Let’s briefly regenerate the data frame:
# Generate data frame
<- data.frame(lemma = c("start", "enjoy", "begin", "help"),
data frequency = c(418, 139, 337, 281))
# Print contents
print(data)
lemma frequency
1 start 418
2 enjoy 139
3 begin 337
4 help 281
There are two common formats in which tabular data can be stored:
as .csv-files (‘comma-separated values’; native format of LibreOffice Calc)
as .xls/.xlsx-files (Microsoft Excel)
To save our data
data frame in .csv-format, we can use the write_table()
function:
write.csv(data, "frequency_data.csv")
The file is now stored at the location of your current R script. You can open this file …
in LibreOffice
in Microsoft Excel via
File
>Import
>CSV file
> Select the file >Delimited
and thenNext
>Comma
andNext
>General
andFinish
.
Clearly, opening CSV files in MS Excel is quite cumbersome, which is why it’s better to export it as an Excel file directly.
We use the write_xlsx()
function provided by the package writexl
.
write_xlsx(data, "frequency_data.xlsx")
The file is now stored at the location of your currently active R script. You should now be able to open it in MS Excel without any issues.
10.2.2 Importing data
Let’s read the two files back into R.
To import the CSV file, we can use the read.csv()
function:
<- read.csv("frequency_data.csv")
imported_csv print(imported_csv)
X lemma frequency
1 1 start 418
2 2 enjoy 139
3 3 begin 337
4 4 help 281
It appears that read.csv()
has also written the row numbers to the file. This is not the desired outcome and can be prevented by adding an additional argument:
<- read.csv("frequency_data.csv", row.names = 1)
imported_csv print(imported_csv) # Problem solved!
lemma frequency
1 start 418
2 enjoy 139
3 begin 337
4 help 281
When working with CSV files, you may encounter issues with character encodings and separators, especially when:
- working with files from different operating systems,
- dealing with text containing special characters (é, ü, ñ, etc.), or
- importing files created in different regions (e.g., European vs. US).
The most common encoding-related parameters for read.csv() are:
# For files with special characters (recommended default)
<- read.csv("myfile.csv", encoding = "UTF-8")
data
# For files from Windows systems
<- read.csv("myfile.csv", encoding = "latin1")
data
# For files using semicolons and commas as decimal points
<- read.csv("myfile.csv", sep = ";", dec = ",") data
- If you see garbled text like é instead of é, try specifying
encoding = "UTF-8"
. - If your data appears in a single column, check if your file uses semicolons (
;
) instead of commas (,
) as separators. - If numeric values are incorrect, verify whether the file uses commas or periods as decimal separators.
For importing the Excel file, we’ll use the read_xlsx()
function from the readxl
package:
<- read_xlsx("frequency_data.xlsx")
imported_excel print(imported_excel)
# A tibble: 4 × 2
lemma frequency
<chr> <dbl>
1 start 418
2 enjoy 139
3 begin 337
4 help 281
That’s it! Nevertheless, remember to always check your imported data to ensure it has been read in correctly, especially when working with CSV files.
10.3 A convenient alternative: RDS files
If the main goal is to save an intermediary result and make it available for later use, the most efficient solution is to save the object to a local R data file ending in .RDS
. Since it compressed data, .RDS files can be considered analogous to .zip files, which are very commonly used for other data types.
In practice, we use the saveRDS()
function and supply it with …
… an R object (e.g., a vector, data frame, matrix, graphs, statistical models – anything goes!) as well as
… the desired name of the file.
# Save data frame "data" to the file "frequency_data.RDS"
saveRDS(data, "frequency_data.RDS")
To read a file back in, we need to indicate the file name (or the full file path if the file is located in a different folder).
# Read in "frequency_data.RDS" and assign the contents to "data2"
<- readRDS("frequency_data.RDS")
data2
# Verify contents
print(data2)
lemma frequency
1 start 418
2 enjoy 139
3 begin 337
4 help 281
10.4 Troubleshooting
Importing data into R is often a challenging and error-prone task, made more difficult by the wide range of potential issues. AI tools can be immensely helpful for identifying and addressing problems specific to your use case. This handout provides some practical guidance, including tips for resolving file-loading errors.
10.5 Exercises
You can find the solutions to the exercises here.
Exercise 10.1 Download the file SCOPE_reduced.RDS
from this repository and read it into a variable named SCOPE
. It contains data from the the South Carolina Psycholinguistic Metabase (Gao, Shinkareva, and Desai 2022), specifically:
Number of meanings (
Nsenses_WordNet
)Emotional valence ratings, which describe the pleasantness of a lexical stimulus on a scale from 1 to 9 (
Valence_Warr
)Data for nearly 200,000 words
Exercise 10.2 Using this database, retrieve
- the number of meanings for the verbs start, enjoy, begin, help. Store them in a data frame with the name
senses_df
. - emotional valence ratings for the words fun, love, vacation, war, politics, failure, table. Store them in a data frame, and name it
valence_df
.
What do you notice about the valence ratings? Do they align with your intuitions about these words’ emotional content?
This task is very similar to Exercise 8.3!
Exercise 10.3 Export senses_df
and valence_df
both as .csv and .xlsx files, and read them back into R.
Exercise 10.4 Separators determine how tabular data is stored internally. Investigate what happens when you read in frequency_data.csv
with different separator settings:
# Comma separator (default)
<- read.csv("frequency_data.csv", sep = ",")
imported_csv1 print(imported_csv1)
# Tab separator
<- read.csv("frequency_data.csv", sep = "\t")
imported_csv2 print(imported_csv2)
# Semi-colon separator
<- read.csv("frequency_data.csv", sep = ";")
imported_csv3 print(imported_csv3)