Data structures

Jeff Stevens

2023-02-06

Review

Mental model of data types

Vectors

Vectors

Actually, everything in R is a vector

vector = atomic vector

  • elements with a single dimension of the same data type

Create vectors with c()

Numeric vectors

(myvec1 <- c(1, 5, 3, 6))
[1] 1 5 3 6
(myvec2 <- c(11, 14, 18, 12))
[1] 11 14 18 12
c(myvec1, myvec2)
[1]  1  5  3  6 11 14 18 12

Create vectors with c()

Character vectors

(myvec3 <- c("a", "b", "c"))
[1] "a" "b" "c"

Create vectors with c()

Strain your brain

What do you think will happen if you combine myvec2 and myvec3?

myvec2
[1] 11 14 18 12
myvec3
[1] "a" "b" "c"
c(myvec2, myvec3)
[1] "11" "14" "18" "12" "a"  "b"  "c" 

Numeric vector myvec2 converts to character vector to combine with myvec3

Create sequences with seq()

seq(from = 0, to = 20, by = 5)
[1]  0  5 10 15 20
seq(from = 20, to = 0, by = -5)
[1] 20 15 10  5  0
seq(0, 1, 0.2)
[1] 0.0 0.2 0.4 0.6 0.8 1.0

Create sequences with :

Sequences with increments of 1

4:9
[1] 4 5 6 7 8 9
9:4
[1] 9 8 7 6 5 4

Try it!

Make a sequence from 0 to 100 in steps of 10.

Create repetitions with rep()

Repeat single numbers

rep(0, times = 10)
 [1] 0 0 0 0 0 0 0 0 0 0

Create repetitions with rep()

Repeat vectors

rep(myvec3, times = 3)
[1] "a" "b" "c" "a" "b" "c" "a" "b" "c"
rep(c("d", "e", "f"), times = 3)
[1] "d" "e" "f" "d" "e" "f" "d" "e" "f"

Create repetitions with rep()

Repeat sequences

rep(1:4, times = 3)
 [1] 1 2 3 4 1 2 3 4 1 2 3 4
rep(1:4, each = 3)
 [1] 1 1 1 2 2 2 3 3 3 4 4 4

Try it!

Create a repetition of “yes” and “no” with 10 instance of each, alternating between the two. Then make one with 10 “yes” and then 10 “no”.

Working with vectors

Find vector length with length()

myvec3
[1] "a" "b" "c"
length(myvec3)
[1] 3

Try it!

How long is the combined vector of myvec1 and myvec2?

Checking typeof() and str()

myvec2
[1] 11 14 18 12
typeof(myvec2)
[1] "double"
str(myvec2)
 num [1:4] 11 14 18 12
myvec3
[1] "a" "b" "c"
typeof(myvec3)
[1] "character"
str(myvec3)
 chr [1:3] "a" "b" "c"

Index with []

Tracks the content of a specific element (starting with 1)

myvec2
[1] 11 14 18 12
myvec2[2]
[1] 14

Allows subsetting

myvec2[2:4]
[1] 14 18 12
myvec2[c(4, 1, 3)]
[1] 12 11 18

Allows reassignment

myvec2[2] <- NA
myvec2
[1] 11 NA 18 12

Lists, data frames, and tibbles

Lists

Recursive vectors (vectors of vectors) potentially with different data types

(mylist <- list(a = 1:4, b = c(4, 3, 8, 5), c = LETTERS[10:15], d = c("yes", "yes")))
$a
[1] 1 2 3 4

$b
[1] 4 3 8 5

$c
[1] "J" "K" "L" "M" "N" "O"

$d
[1] "yes" "yes"

Working with lists

typeof(mylist)
[1] "list"
typeof(mylist$b)
[1] "double"
str(mylist)
List of 4
 $ a: int [1:4] 1 2 3 4
 $ b: num [1:4] 4 3 8 5
 $ c: chr [1:6] "J" "K" "L" "M" ...
 $ d: chr [1:2] "yes" "yes"

Data frames

List of named vectors of the same length (rectangular)

mydf <- data.frame(
  datetime = as.Date(c("2021-04-21 11:56:12", "2021-04-21 14:57:44", "2021-04-22 03:09:56", "2021-04-22 12:39:22")),
  session_complete = as.logical(c("TRUE", "TRUE", "TRUE", "FALSE")),
  condition = as.factor(c("control", "control", "experimental", "experimental")),
  mean_response = c(17.53, 24.45, 19.82, NA),
  age = c(19, 20, 19, NA),
  comments = c("none", "Great study", "toooo long", NA)
  )

Data frames

List of named vectors of the same length (rectangular)

mydf
    datetime session_complete    condition mean_response age    comments
1 2021-04-21             TRUE      control         17.53  19        none
2 2021-04-21             TRUE      control         24.45  20 Great study
3 2021-04-22             TRUE experimental         19.82  19  toooo long
4 2021-04-22            FALSE experimental            NA  NA        <NA>
typeof(mydf)
[1] "list"
str(mydf)
'data.frame':   4 obs. of  6 variables:
 $ datetime        : Date, format: "2021-04-21" "2021-04-21" ...
 $ session_complete: logi  TRUE TRUE TRUE FALSE
 $ condition       : Factor w/ 2 levels "control","experimental": 1 1 2 2
 $ mean_response   : num  17.5 24.4 19.8 NA
 $ age             : num  19 20 19 NA
 $ comments        : chr  "none" "Great study" "toooo long" NA

Creating data frames

Create new vectors

(mydf1 <- data.frame(subject = 1:3, 
                     response = 8:6))
  subject response
1       1        8
2       2        7
3       3        6

Combine existing vectors

var1 <- c(1:6)
var2 <- c(6:1)
var3 <- c(21:26)
mydf2 <- data.frame(var1, var2, 
                    resp = var3)
mydf2
  var1 var2 resp
1    1    6   21
2    2    5   22
3    3    4   23
4    4    3   24
5    5    2   25
6    6    1   26

Index with [row, column]

mydf1
  subject response
1       1        8
2       2        7
3       3        6
mydf1[2, 1] 
[1] 2
mydf1[2, 1] <- 6
mydf1
  subject response
1       1        8
2       6        7
3       3        6

Index with [row, column]

Extract whole rows/columns

mydf1[2, ] 
  subject response
2       6        7
mydf1[, 2] 
[1] 8 7 6

Extract subsets

mydf1[2:3, 2]
[1] 7 6
mydf1[2:3, 1:2]
  subject response
2       6        7
3       3        6

Working with data frames

But extract columns by name with $

mydf1$response 
[1] 8 7 6
mydf1$response[2] 
[1] 7
mydf1$response[2:3] 
[1] 7 6

Strain your brain

Why should you use column names rather than number?

Working with data frames

View first rows with head()

head(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Note

Add the argument n = 10 to head(mtcars). What does this do?

Working with data frames

View dimensions

dim(mtcars)
[1] 32 11
nrow(mtcars)
[1] 32
ncol(mtcars)
[1] 11

Tibbles

Tibbles are just tidyverse versions of data frames

mydf2
  var1 var2 resp
1    1    6   21
2    2    5   22
3    3    4   23
4    4    3   24
5    5    2   25
6    6    1   26
(mytibble <- tibble::tibble(mydf2))
# A tibble: 6 × 3
   var1  var2  resp
  <int> <int> <int>
1     1     6    21
2     2     5    22
3     3     4    23
4     4     3    24
5     5     2    25
6     6     1    26

Mental model of data in R

Let’s code!

Data structures coding [Rmd]