Importing data

Jeff Stevens

2023-02-08

Review

Mental model of data in R

Mental model of data analysis

Mental model of importing data

Data files

File types

Excel (.xls/.xlsx): Binary matrix file with formatting, formulas, multiple sheets
Comma-separated values (.csv): Plain text matrix file without formatting, etc. (also TSV)
Other program-specific files: SPSS, SAS, etc.
Text files (.txt): Plain text file of raw text
Start saving CSVs and convert other formats to CSVs

Dog data

Download data for dog breed popularity.
Create data/ directory in your dpavir2023 course directory.
Save dog_breed_popularity.csv into the data/ directory.
View file in RStudio file manager

Importing CSV files

Base R data import

read.csv()

Wrapper around read.table()

Defaults

Header row (turn off with header = FALSE)
Comma separated (change with sep=";" or use read.csv2())
Outputs data frame

Base R data import

read.csv()

Usage:

read.csv(file = "path/to/file.csv")

library(here)
mydf <- read.csv(here("data/dog_breed_popularity.csv"))

{tidyverse} data import

{readr}

readr::read_csv()

Control column names with col_names (including renaming)
Control column types with col_types
Control missing values with na and quoted_na
Can skip rows before reading data with skip or cut off with n_max
Outputs tibble

{tidyverse} data import

readr::read_csv()

Usage:

read_csv(file = "path/to/file.csv")

library(readr)
mydf2 <- read_csv(here("data/dog_breed_popularity.csv"))

Importing from URLs

Both read.csv() and read_csv() import CSV files available online by using the URL as the path.

https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits.csv

mydf3 <- read.csv("https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits.csv")
mydf4 <- read_csv("https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits.csv")

Exporting CSVs

write.csv()

Character/factor columns in quotes with quote = TRUE
Remove row/column names with row.names = FALSE or col.names = FALSE

readr::write_csv()

Characters are only quoted if they contain a comma, quote, or new line

Exporting CSVs

Usage

write.csv(df, file = "path/to/file.csv")

write_csv(df, file = "path/to/file.csv")

write.csv(mydf, here("data/newdata.csv"))
write_csv(mydf, here("data/newdata2.csv"))

Importing other files

Excel data

Import Excel data with {readxl}

Functions: read_xls(), read_xlsx(), read_excel()
Specify sheets with sheets argument
Specify subset of cells with range argument
Like read_csv(), has col_names, col_types, na, skip, n_max

Excel data

Import Excel data with {readxl}

Usage:

read_excel(path = "path/to/file.csv")^*

library(readxl)
mydf5 <- read_excel(here("data/dog_breed_data.xlsx"), sheet = "Sheet2")

Other stats packages

Import SPSS, SAS, & Stata data with {haven}

SPSS

haven::read_sav("mtcars.sav")

SAS

haven::read_sas("mtcars.sas7bdat")

Stata

haven::read_dta("mtcars.dta")

Qualtrics data

Import Qualtrics data directly with {qualtRics}

Register your Qualtrics credentials with qualtRics::qualtrics_api_credentials()^*
Get survey ID by viewing qualtRics::all_surveys()
Import data with qualtRics::fetch_survey()
Never have to download Qualtrics data again!

Qualtrics data

Import Qualtrics data directly with {qualtRics}

Download choice text by default or numeric values with label = FALSE
Set time zone with time_zone = "America/Chicago"
Turn off sublabels with add_var_labels = FALSE

Qualtrics data

Usage

mydf6 <- qualtRics::fetch_survey("SV_xxxxxxxxxxxxx", save_dir = "data", label = FALSE, convert = FALSE, 
             force_request = TRUE, time_zone = "America/Chicago")

Cloud storage

Import data directly from cloud storage

OneDrive {Microsoft365R}^*
Google sheets {googlesheets4}^*
Box {boxr}^*

Mental model of importing data

Let’s code!

Importing data [Rmd]