2023-02-24
What’s different between these data sets?
What needs to happen to create data2
from data1
?
data1
# A tibble: 4 × 4
id cond1 cond2 cond3
<int> <dbl> <dbl> <dbl>
1 1 0.197 0.871 0.623
2 2 0.0414 0.0636 0.0596
3 3 0.880 0.523 0.366
4 4 0.913 0.530 0.625
data2
id condition response
1 1 cond1 0.19666041
2 1 cond2 0.87096339
3 1 cond3 0.62344725
4 2 cond1 0.04137505
5 2 cond2 0.06362335
6 2 cond3 0.05964197
7 3 cond1 0.88014147
8 3 cond2 0.52252628
9 3 cond3 0.36627275
10 4 cond1 0.91305613
11 4 cond2 0.52961833
12 4 cond3 0.62510585
Each variable has its own column
Each observation has its own row
Each value has its own cell
table1
# A tibble: 6 × 4
country year cases population
<chr> <dbl> <dbl> <dbl>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil 1999 37737 172006362
4 Brazil 2000 80488 174504898
5 China 1999 212258 1272915272
6 China 2000 213766 1280428583
table2
# A tibble: 12 × 4
country year type count
<chr> <dbl> <chr> <dbl>
1 Afghanistan 1999 cases 745
2 Afghanistan 1999 population 19987071
3 Afghanistan 2000 cases 2666
4 Afghanistan 2000 population 20595360
5 Brazil 1999 cases 37737
6 Brazil 1999 population 172006362
7 Brazil 2000 cases 80488
8 Brazil 2000 population 174504898
9 China 1999 cases 212258
10 China 1999 population 1272915272
11 China 2000 cases 213766
12 China 2000 population 1280428583
table3
# A tibble: 6 × 3
country year rate
<chr> <dbl> <chr>
1 Afghanistan 1999 745/19987071
2 Afghanistan 2000 2666/20595360
3 Brazil 1999 37737/172006362
4 Brazil 2000 80488/174504898
5 China 1999 212258/1272915272
6 China 2000 213766/1280428583
table4a
# A tibble: 3 × 3
country `1999` `2000`
<chr> <dbl> <dbl>
1 Afghanistan 745 2666
2 Brazil 37737 80488
3 China 212258 213766
table4b
# A tibble: 3 × 3
country `1999` `2000`
<chr> <dbl> <dbl>
1 Afghanistan 19987071 20595360
2 Brazil 172006362 174504898
3 China 1272915272 1280428583
Think about tidy from a model perspective
Tidyverse assumes tidy data
Easier to analyze and plot tidy data
But sometimes easier to store non-tidy data
table4a
# A tibble: 3 × 3
country `1999` `2000`
<chr> <dbl> <dbl>
1 Afghanistan 745 2666
2 Brazil 37737 80488
3 China 212258 213766
table4a
not tidy?pivot_longer()
pivot_longer(table4a, cols = c(`1999`, `2000`),
names_to = "year", values_to = "cases")
# A tibble: 6 × 3
country year cases
<chr> <chr> <dbl>
1 Afghanistan 1999 745
2 Afghanistan 2000 2666
3 Brazil 1999 37737
4 Brazil 2000 80488
5 China 1999 212258
6 China 2000 213766
table2
# A tibble: 12 × 4
country year type count
<chr> <dbl> <chr> <dbl>
1 Afghanistan 1999 cases 745
2 Afghanistan 1999 population 19987071
3 Afghanistan 2000 cases 2666
4 Afghanistan 2000 population 20595360
5 Brazil 1999 cases 37737
6 Brazil 1999 population 172006362
7 Brazil 2000 cases 80488
8 Brazil 2000 population 174504898
9 China 1999 cases 212258
10 China 1999 population 1272915272
11 China 2000 cases 213766
12 China 2000 population 1280428583
table2
not tidy?pivot_wider()
pivot_wider(table2, id_cols = c("country", "year"),
names_from = type, values_from = count)
# A tibble: 6 × 4
country year cases population
<chr> <dbl> <dbl> <dbl>
1 Afghanistan 1999 745 19987071
2 Afghanistan 2000 2666 20595360
3 Brazil 1999 37737 172006362
4 Brazil 2000 80488 174504898
5 China 1999 212258 1272915272
6 China 2000 213766 1280428583
What code turns data1
into data2
? And vice versa?
data1
# A tibble: 4 × 4
id cond1 cond2 cond3
<int> <dbl> <dbl> <dbl>
1 1 0.197 0.871 0.623
2 2 0.0414 0.0636 0.0596
3 3 0.880 0.523 0.366
4 4 0.913 0.530 0.625
data2
id condition response
1 1 cond1 0.19666041
2 1 cond2 0.87096339
3 1 cond3 0.62344725
4 2 cond1 0.04137505
5 2 cond2 0.06362335
6 2 cond3 0.05964197
7 3 cond1 0.88014147
8 3 cond2 0.52252628
9 3 cond3 0.36627275
10 4 cond1 0.91305613
11 4 cond2 0.52961833
12 4 cond3 0.62510585