Resources

Packages

To install all of the course packages:

If you are using Windows, first install RTools.
Then copy and paste the following line to install all packages (click the little clipboard in top right corner to copy everything).
Warning: it may take a while for these to install, so don’t start this if you need access to your R session (or open another R session to do this).

install.packages(c("here", "palmerpenguins", "remotes", "tidyverse", "knitr", "rmarkdown", "papaja", "tinytex", "dataReporter", "qualtRics", "readxl", "nycflights13", "lubridate", "ggthemes", "patchwork", "gt"))

Click Full package list below to see all packages with links to their websites.

Full package list

Getting started

Literate programming

Data import/validation

Data processing

Plotting and tables

Glossary

If you want to find definitions of the terms that we use in the course, check out the PsyTeachR Glossary

Function list

This is a list of all of the functions that we will be learning throughout the course. Note these may change as we progress through the course. Click Full function list below to see all functions with links to their websites.

Full function list

Packages

install.packages(): install R packages
library(): load R packages
:: export variable from package for use

Data types

>, >=, <, <=, ==, !=, %in%: logical operators that output TRUE or FALSE
typeof(), class(), str(): outputs object type, class, and structure
is.numeric(), is.character(), is.factor(): checks whether object is numeric, character, factor
as.numeric(), as.character(), as.factor(): coerces (converts) object to numeric, character, factor
is.na(): checks whether object is NA and outputs logical

Data structures

[]: index elements in vector, matrix, data frame, tibble
$: index column by name in data frame, tibble, list
:, seq(), rep(): creates sequences and repetitions of numbers
length(): outputs length of vector
dim(), nrow(), ncol(): outputs dimensions, number of rows, number of columns of matrices, data frames, tibbles
colnames(): outputs (and can assign) column names
head(), tail(), dplyr::glimpse(): outputs compressed views of data frames, tibbles
c(), list(), data.frame(), tibble::tibble(): creates vectors, matrices, data frames, tibbles

Importing data

here::here(): starts path at project directory
read.csv(), write.csv(), readr::read_csv(), readr::write_csv(): imports and writes CSV files
readxl::read_excel(): imports Excel files

Validating data

range(), min(), max(): finds range, minimum, and maximum of vector
unique(): returns vector of unique (not duplicated) elements
duplicated(): returns logical vector of duplicated elements
which(): returns indices of which elements of a logical vector are TRUE
summary(): when applied to day, gives summary statistics
skimr::skim(): outputs overview of data
dataReporter::makeCodebook(): creates codebook of data

Cleaning columns

dplyr::select(): selects subset of columns from data frame, tibble
dplyr::everything(), dplyr::contains(), dplyr::starts_with(), dplyr::ends_with(): helper functions for select()
dplyr::relocate(), dplyr::rename(): moves and renames columns in data frame, tibble
dplyr::mutate(), dplyr::transmute(): applies function to change existing column or create new column
dplyr::across(): applies function across multiple columns inside mutate()
dplyr::rowwise(): applies function to each row
%>%: pipe operator that transfers output to the next command
dplyr::pull(): creates a vector from a data frame/tibble column

Wrangling rows

dplyr::filter(): filters subset of rows from data frame, tibble
dplyr::if_any(): apply function to columns and return TRUE if any values are TRUE
tidyr::drop_na(): drop rows containing missing values
dplyr::arrange(), dplyr::desc(): sorts rows by column variable, in descending order
dplyr::group_by(): groups data by column levels
dplyr::summarise(): applies function over whole column or group

Tidy data

tidyr::pivot_longer(), tidyr::pivot_wider(): reshapes data to be longer or wider
tidyr::separate(), tidyr::unite(): separates or combines column data with separator
dplyr::coalesce(): find the first non-missing element
tidyr::complete(), tidyr::expand(), tidyr::nesting(): finds all unique combinations of levels

Merging data

dplyr::inner_join(), dplyr::left_join(), dplyr::right_join(){target=“_blank”}, dplyr::full_join(): mutating joins that merge data frames
dplyr::semi_join(), dplyr::anti_join(): filtering joins that filter data frame based on another data frame
dplyr::join_by(): join data frames with different names for key columns (requires {dplyer} v. 1.1.0 or higher)
tibble::add_row(): manually add rows of data
dplyr::bind_rows(), dplyr::bind_cols(): binds rows or columns to data frame
dplyr::intersect(), dplyr::setdiff(), dplyr::union(), dplyr::union_all(): set operations to find overlap, differences, and combinations of data sets

Strings

stringr::str_length(): finds the number of characters in a string
stringr::str_trunc(), stringr::str_pad(): removes or adds characters to strings
stringr::str_trim(), stringr::str_squish(): removes whitespace from strings
stringr::str_c(): combine character vectors into single string
stringr::str_sub(): extracts parts of strings based on character position
stringr::str_to_lower(), stringr::str_to_upper(): converts all letters to lowercase or uppercase
stringr::str_to_title(), stringr::str_to_sentence(): converts strings to title or sentence case
stringr::str_detect(), stringr::str_subset(), stringr::str_extract(): detects, subsets, and extracts strings
stringr::str_replace(), stringr::str_replace_all(): replaces patterns with strings
stringr::str_split(): splits strings based on separators
stringr::str_glue(), stringr::str_glue_data(): combines strings with R output

Factors

levels(): prints factor levels
forcats::fct_inorder(), forcats::fct_rev(): orders levels by order in data or in reverse of current order
forcats::fct_relevel(): manually reorders levels
forcats::fct_reorder(): orders levels based on another variable
forcats::fct_recode(): recodes level with new value
forcats::fct_collapse(): recodes multiple levels into single new value
forcats::fct_lump(): lumps infrequent levels into level “Other”

Grammar of graphics

ggplot2::ggplot(): creates a ggplot
+: pipe operator for ggplots
ggplot2::aes(): defines aesthetic properties of plot
color, fill, shape, size arguments: properties for geometric objects
ggplot2::ggsave(): saves ggplot to file

Visualizing distributions

ggplot2::geom_histogram(): plots histograme
ggplot2::geom_density(): plots density plot
ggplot2::geom_boxplot(): plots boxplot
ggplot2::geom_violin(): plots violin plot
ggplot2::stat_summary(): plots summaries of data (e.g., means $\pm$ standard error)

Visualizing amounts and proportions

dplyr::count(): calculates counts of data by variables
ggplot2::geom_bar(): plots bar plot with raw data
ggplot2::geom_col(): plots bar plot with counts
position argument: controls whether data are stacked, dodged, jittered, nudged
ggplot2::geom_point(): plots scatterplots
ggplot2::coord_flip(): flips x and y coordinates
ggplot2::coord_polar(): converts to polar coordinates
ggplot2::geom_linerange(): plots point and error bar

Visualizing x-y data

ggplot2::geom_abline(): plots line with slope and intercept
pairs(): plots correlation plots
GGally::ggpairs(): plots correlation plots
ggplot2::geom_tile(): plots tile plot
ggcorrplot::ggcorrplot(): plots correlation heatmaps
ggplot2::geom_line(): plots line plot
ggplot2::geom_area(): plots area under curve or line plot
ggplot2::geom_smooth(): plots fitted lines and curves
ggplot2::geom_rug(): plots rug plot

Color

ggplot2::scale_color_brewer(), ggplot2::scale_fill_brewer(): uses existing qualitative colors scales for color and fill
ggplot2::scale_color_manual(), ggplot2::scale_fill_manual(): sets manual colors for color and fill
ggplot2::scale_color_gradient(), ggplot2::scale_fill_gradient(): sets sequential color gradient for color and fill
ggplot2::scale_color_distiller(), ggplot2::scale_fill_distiller(): sets diverging color scale for color and fill

Finessing plots

ggplot2::geom_jitter(): plots jittered scatterplot
ggbeeswarm::geom_beeswarm(): plots beeswarm plot
ggplot2::scale_x_discrete(), ggplot2::scale_y_discrete(): adjusts discrete scale properties (e.g., limits, ticks)
ggplot2::scale_x_continuous(), ggplot2::scale_y_continuous(): adjusts continuous scale properties (e.g., limits, ticks)
ggplot2::lims(), ggplot2::xlim(), ggplot2::ylim(): adjusts axis limits
ggplot2::facet_wrap(), ggplot2::facet_grid(): creates facets based on discrete variables

Adorning plots

ggplot2::labs(), ggplot2::xlab(), ggplot2::ylab(): replaces axis labels
ggplot2::annotate(): annotates plot with text, segments, rectangles, etc.
ggplot2::geom_text(): plots text as aesthetic property
ggplot2::geom_hline(), ggplot2::geom_vline(): plots horizontal and vertical reference lines
ggplot2::stat_ellipse(): plots ellipse around data

Tables

knitr::kable(): creates table from data frame
kableExtra::kable_styling(): styles table
kableExtra::pack_rows(), kableExtra::add_header_above(): adds grouping variables to rows or columns
kableExtra::footnote(): adds table note
kableExtra::landscape(): rotates table to landscape orientation
papaja::apa_table(): formats data frame to APA style table
papaja::apa_print(): formats statistics to APA style

Flashcards

Flashcards can be a useful way to help learning functions and their descriptions. I created a package called {flashr} that builds decks of HTML flashcards. You’re welcome to build your own decks of flashcards by installing the package and following the instructions for building decks. Or, you can use existing decks built for the course or for each of the chapters of R for Data Science (1st edition).

DPaViR flashcards

Introduction (terms first) (definitions first)
Coding and workflows (terms first) (definitions first)
Data types (terms first) (definitions first)
Data structures (terms first) (definitions first)
Importing data (terms first) (definitions first)
Validating data (terms first) (definitions first)
Cleaning columns (terms first) (definitions first)
Wrangling rows (terms first) (definitions first)
Tidy data (terms first) (definitions first)
Merging data (terms first) (definitions first)

R4DS flashcards

Packages

Getting started

Literate programming

Data import/validation

Data processing

Plotting and tables

Glossary

Function list

Packages

Data types

Data structures

Importing data

Validating data

Cleaning columns

Wrangling rows

Tidy data

Merging data

Strings

Factors

Grammar of graphics

Visualizing distributions

Visualizing amounts and proportions

Visualizing x-y data

Color

Finessing plots

Adorning plots

Tables

Flashcards

Miscellaneous