Importing data

Jeff Stevens

2023-02-08

Review

Mental model of data in R

Mental model of data analysis

Mental model of importing data

Data files

File types

  • Excel (.xls/.xlsx): Binary matrix file with formatting, formulas, multiple sheets

  • Comma-separated values (.csv): Plain text matrix file without formatting, etc. (also TSV)

  • Other program-specific files: SPSS, SAS, etc.

  • Text files (.txt): Plain text file of raw text

  • Start saving CSVs and convert other formats to CSVs

Dog data

  • Download data for dog breed popularity.

  • Create data/ directory in your dpavir2023 course directory.

  • Save dog_breed_popularity.csv into the data/ directory.

  • View file in RStudio file manager

Importing CSV files

Base R data import

read.csv()

Defaults

  • Header row (turn off with header = FALSE)

  • Comma separated (change with sep=";" or use read.csv2())

  • Outputs data frame

Base R data import

read.csv()

Usage:

read.csv(file = "path/to/file.csv")

library(here)
mydf <- read.csv(here("data/dog_breed_popularity.csv"))

{tidyverse} data import

{readr}

readr::read_csv()

  • Control column names with col_names (including renaming)

  • Control column types with col_types

  • Control missing values with na and quoted_na

  • Can skip rows before reading data with skip or cut off with n_max

  • Outputs tibble

{tidyverse} data import

readr::read_csv()

Usage:

read_csv(file = "path/to/file.csv")

library(readr)
mydf2 <- read_csv(here("data/dog_breed_popularity.csv"))

Importing from URLs

Both read.csv() and read_csv() import CSV files available online by using the URL as the path.

https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits.csv

mydf3 <- read.csv("https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits.csv")
mydf4 <- read_csv("https://jeffreyrstevens.quarto.pub/dpavir/data/dog_breed_traits.csv")

Exporting CSVs

write.csv()

  • Character/factor columns in quotes with quote = TRUE

  • Remove row/column names with row.names = FALSE or col.names = FALSE

readr::write_csv()

  • Characters are only quoted if they contain a comma, quote, or new line

Exporting CSVs

Usage

write.csv(df, file = "path/to/file.csv")

write_csv(df, file = "path/to/file.csv")

write.csv(mydf, here("data/newdata.csv"))
write_csv(mydf, here("data/newdata2.csv"))

Importing other files

Excel data

Import Excel data with {readxl}

Excel data

Import Excel data with {readxl}

Usage:

read_excel(path = "path/to/file.csv")*

library(readxl)
mydf5 <- read_excel(here("data/dog_breed_data.xlsx"), sheet = "Sheet2")

Other stats packages

Import SPSS, SAS, & Stata data with {haven}

SPSS

haven::read_sav("mtcars.sav")

SAS

haven::read_sas("mtcars.sas7bdat")

Stata

haven::read_dta("mtcars.dta")

Qualtrics data

Import Qualtrics data directly with {qualtRics}

  1. Register your Qualtrics credentials with qualtRics::qualtrics_api_credentials()*

  2. Get survey ID by viewing qualtRics::all_surveys()

  3. Import data with qualtRics::fetch_survey()

  4. Never have to download Qualtrics data again!

Qualtrics data

Import Qualtrics data directly with {qualtRics}

  • Download choice text by default or numeric values with label = FALSE

  • Set time zone with time_zone = "America/Chicago"

  • Turn off sublabels with add_var_labels = FALSE

Qualtrics data

Usage

mydf6 <- qualtRics::fetch_survey("SV_xxxxxxxxxxxxx", save_dir = "data", label = FALSE, convert = FALSE, 
             force_request = TRUE, time_zone = "America/Chicago")

Cloud storage

Import data directly from cloud storage

Mental model of importing data

Let’s code!

Importing data [Rmd]