Dates and times

Jeff Stevens

2023-03-22

Introduction

The problem

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.0     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.1     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

What’s different between these data sets?

data1
# A tibble: 12 × 2
   test_date  birth_date 
   <date>     <chr>      
 1 2023-01-02 1997-07-14 
 2 2023-01-02 1998-01-28 
 3 2023-01-05 1967-07-23 
 4 2023-01-05 Jan 9, 1960
 5 2023-01-08 1950-11-09 
 6 2023-01-14 2001-08-24 
 7 2023-01-16 1979-09-23 
 8 2023-01-23 1970-03-22 
 9 2023-01-26 1957-04-21 
10 2023-01-27 1989-03-07 
11 2023-01-27 1983-11-03 
12 2023-01-28 1989-01-31 
data2
# A tibble: 9 × 4
  test_date  birth_date age_at_testing day_of_birth
  <date>     <date>     <drtn>         <ord>       
1 2023-01-05 1967-07-23 20255 days     Sunday      
2 2023-01-05 1960-01-09 23007 days     Saturday    
3 2023-01-08 1950-11-09 26358 days     Thursday    
4 2023-01-16 1979-09-23 15821 days     Sunday      
5 2023-01-23 1970-03-22 19300 days     Sunday      
6 2023-01-26 1957-04-21 24021 days     Sunday      
7 2023-01-27 1989-03-07 12379 days     Tuesday     
8 2023-01-27 1983-11-03 14330 days     Thursday    
9 2023-01-28 1989-01-31 12415 days     Tuesday     

Set-up

Dates and times

Reminder

Dates and times are augmented doubles

(x <- as.Date("2023-03-22"))
[1] "2023-03-22"
[1] "Date"
[1] "double"

Note

Standard (ISO-8601) way to represent dates and times is

YYYY-MM-DD HH:MM:SS, so 2023-03-22 10:30:00

Dates and times with {lubridate}

Current date/time

Sys.Date()  # base R
[1] "2023-03-26"
today() # {lubridate}
[1] "2023-03-26"
Sys.time()  # base R
[1] "2023-03-26 15:02:08 CDT"
now()  # {lubridate}
[1] "2023-03-26 15:02:08 CDT"

Creating dates/times

as.Date("2023-03-22")  # base R
[1] "2023-03-22"
as_date("2023-03-22") # {lubridate}
[1] "2023-03-22"
ymd(20230322) # {lubridate}
[1] "2023-03-22"

Convert dates to ISO-8601

ymd("2017-01-31")
[1] "2017-01-31"
mdy("January 31st, 2017")
[1] "2017-01-31"
mdy("Jan 31 17")
[1] "2017-01-31"
dmy("31-Jan-2017")
[1] "2017-01-31"

Convert dates to ISO-8601

(r_class_schedule <- tibble(meeting = 1:4, date = c("23 Jan 2023", "25 Jan 2023", "27 Jan 2023", "30 Jan 2023"), topic = c("Course introduction", "Working in RStudio", "Coding basics", "Workflows")))
# A tibble: 4 × 3
  meeting date        topic              
    <int> <chr>       <chr>              
1       1 23 Jan 2023 Course introduction
2       2 25 Jan 2023 Working in RStudio 
3       3 27 Jan 2023 Coding basics      
4       4 30 Jan 2023 Workflows          

How do we change the dates in a data frame?

(r_class_schedule <- r_class_schedule |>
   mutate(iso_date = dmy(date)))
# A tibble: 4 × 4
  meeting date        topic               iso_date  
    <int> <chr>       <chr>               <date>    
1       1 23 Jan 2023 Course introduction 2023-01-23
2       2 25 Jan 2023 Working in RStudio  2023-01-25
3       3 27 Jan 2023 Coding basics       2023-01-27
4       4 30 Jan 2023 Workflows           2023-01-30

Convert multiple formats

What if your date column has multiple formats?

(bad_dates <- c("Jan 1 2023", "2-Jan-2023"))
[1] "Jan 1 2023" "2-Jan-2023"
as_date(bad_dates)
Warning: All formats failed to parse. No formats found.
[1] NA NA

Date formatting

Codes for different components/styles of date components

Code Component
%y Two digit year (23)
%Y Four digit year (2023)
%m Month as number (01-12 or 1-12)
%b Abbreviated month name (Mar)
%B Full month name (March)
%d Day of the month (01-31 or 1-31)

Date formatting

Combine codes to make dates

2023-03-22 = "%Y-%m-%d"

3/22/23 = "%m/%d/%y"

23 Mar 2023 = "%d %b %Y"

March 23, 2023 = "%B %d, %Y"

Date formatting

(bad_dates <- c("Jan 01 2023", "02-Jan-2023"))
[1] "Jan 01 2023" "02-Jan-2023"
as_date(bad_dates, format = "%b %d %Y")
Warning: 1 failed to parse.
[1] "2023-01-01" NA          
as_date(bad_dates, format = c("%b %d %Y", "%d-%b-%Y"))
[1] "2023-01-01" "2023-01-02"

Convert times to ISO-8601

hms("20:11:59")
[1] "20H 11M 59S"
hm("10:30")
[1] "10H 30M 0S"

Convert date-times to ISO-8601

as_datetime("2023-03-10")
[1] "2023-03-10 UTC"
ymd_hms("2023-03-10 20:11:59")
[1] "2023-03-10 20:11:59 UTC"
mdy_hm("03/22/2023 10:30")
[1] "2023-03-22 10:30:00 UTC"

Change time zone

tz argument

ymd_hms("2023-03-10 20:11:59", tz = "America/Chicago")
[1] "2023-03-10 20:11:59 CST"

Find system time zone

[1] "America/Chicago"
mdy_hm("03/22/2023 10:30", tz = Sys.timezone())  
[1] "2023-03-22 10:30:00 CDT"

Warning

Setting tz = Sys.timezone() is dangerous and not reproducible if you are traveling or giving code to others in different time zones.

Date/time components

Create dates from components

flights |>
  select(year, month, day, hour, minute)
# A tibble: 336,776 × 5
    year month   day  hour minute
   <int> <int> <int> <dbl>  <dbl>
 1  2013     1     1     5     15
 2  2013     1     1     5     29
 3  2013     1     1     5     40
 4  2013     1     1     5     45
 5  2013     1     1     6      0
 6  2013     1     1     5     58
 7  2013     1     1     6      0
 8  2013     1     1     6      0
 9  2013     1     1     6      0
10  2013     1     1     6      0
# … with 336,766 more rows

Create dates from components

make_date(), make_datetime()

flights |>
  select(year, month, day, hour, minute) |>
  mutate(date = make_date(year, month, day),
         departure = make_datetime(year, month, day, hour, minute))
# A tibble: 336,776 × 7
    year month   day  hour minute date       departure          
   <int> <int> <int> <dbl>  <dbl> <date>     <dttm>             
 1  2013     1     1     5     15 2013-01-01 2013-01-01 05:15:00
 2  2013     1     1     5     29 2013-01-01 2013-01-01 05:29:00
 3  2013     1     1     5     40 2013-01-01 2013-01-01 05:40:00
 4  2013     1     1     5     45 2013-01-01 2013-01-01 05:45:00
 5  2013     1     1     6      0 2013-01-01 2013-01-01 06:00:00
 6  2013     1     1     5     58 2013-01-01 2013-01-01 05:58:00
 7  2013     1     1     6      0 2013-01-01 2013-01-01 06:00:00
 8  2013     1     1     6      0 2013-01-01 2013-01-01 06:00:00
 9  2013     1     1     6      0 2013-01-01 2013-01-01 06:00:00
10  2013     1     1     6      0 2013-01-01 2013-01-01 06:00:00
# … with 336,766 more rows

Extract date/time elements

First, let’s extract a random sample of departure times

(datetime <- flights |>
   drop_na(dep_time) |> 
   slice_sample(n = 20) |>
   mutate(departure = make_datetime(year, month, day, hour, minute)) |> 
   pull(departure))
 [1] "2013-05-02 06:00:00 UTC" "2013-09-16 20:00:00 UTC"
 [3] "2013-09-18 20:05:00 UTC" "2013-01-30 22:49:00 UTC"
 [5] "2013-05-13 07:05:00 UTC" "2013-04-23 16:00:00 UTC"
 [7] "2013-06-04 20:40:00 UTC" "2013-08-12 13:30:00 UTC"
 [9] "2013-05-07 15:25:00 UTC" "2013-04-07 22:25:00 UTC"
[11] "2013-05-21 07:00:00 UTC" "2013-04-22 20:00:00 UTC"
[13] "2013-09-09 15:35:00 UTC" "2013-07-12 09:59:00 UTC"
[15] "2013-08-06 19:39:00 UTC" "2013-12-01 10:56:00 UTC"
[17] "2013-07-28 06:55:00 UTC" "2013-12-19 06:30:00 UTC"
[19] "2013-06-05 16:29:00 UTC" "2013-09-18 17:25:00 UTC"

Extract date/time elements

Now let’s extract components

year(datetime)
 [1] 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013
[16] 2013 2013 2013 2013 2013
month(datetime)
 [1]  5  9  9  1  5  4  6  8  5  4  5  4  9  7  8 12  7 12  6  9
month(datetime, label = TRUE)
 [1] May Sep Sep Jan May Apr Jun Aug May Apr May Apr Sep Jul Aug Dec Jul Dec Jun
[20] Sep
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec

Extract date/time elements

Now let’s extract components

mday(datetime)
 [1]  2 16 18 30 13 23  4 12  7  7 21 22  9 12  6  1 28 19  5 18
yday(datetime)
 [1] 122 259 261  30 133 113 155 224 127  97 141 112 252 193 218 335 209 353 156
[20] 261
wday(datetime)
 [1] 5 2 4 4 2 3 3 2 3 1 3 2 2 6 3 1 1 5 4 4
wday(datetime, label = TRUE, abbr = FALSE)
 [1] Thursday  Monday    Wednesday Wednesday Monday    Tuesday   Tuesday  
 [8] Monday    Tuesday   Sunday    Tuesday   Monday    Monday    Friday   
[15] Tuesday   Sunday    Sunday    Thursday  Wednesday Wednesday
7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

Extract date/time elements

Now let’s extract components

hour(datetime)
 [1]  6 20 20 22  7 16 20 13 15 22  7 20 15  9 19 10  6  6 16 17
minute(datetime)
 [1]  0  0  5 49  5  0 40 30 25 25  0  0 35 59 39 56 55 30 29 25
seconds(datetime)
 [1] "1367474400S" "1379361600S" "1379534700S" "1359586140S" "1368428700S"
 [6] "1366732800S" "1370378400S" "1376314200S" "1367940300S" "1365373500S"
[11] "1369119600S" "1366660800S" "1378740900S" "1373623140S" "1375817940S"
[16] "1385895360S" "1374994500S" "1387434600S" "1370449740S" "1379525100S"

Create vectors of days of the week

wday(1:7, label = TRUE, abbr = FALSE)
[1] Sunday    Monday    Tuesday   Wednesday Thursday  Friday    Saturday 
7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
as.character(wday(1:7, label = TRUE, abbr = FALSE))
[1] "Sunday"    "Monday"    "Tuesday"   "Wednesday" "Thursday"  "Friday"   
[7] "Saturday" 
stringr::str_c(as.character(wday(1:7, label = TRUE, abbr = FALSE)), collapse = ", ")
[1] "Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday"

Set date/time elements with components

head(datetime)
[1] "2013-05-02 06:00:00 UTC" "2013-09-16 20:00:00 UTC"
[3] "2013-09-18 20:05:00 UTC" "2013-01-30 22:49:00 UTC"
[5] "2013-05-13 07:05:00 UTC" "2013-04-23 16:00:00 UTC"
year(datetime) <- 2020
head(datetime)
[1] "2020-05-02 06:00:00 UTC" "2020-09-16 20:00:00 UTC"
[3] "2020-09-18 20:05:00 UTC" "2020-01-30 22:49:00 UTC"
[5] "2020-05-13 07:05:00 UTC" "2020-04-23 16:00:00 UTC"

Time spans

Time spans

Find or create durations

r_class_schedule$iso_date[2] - r_class_schedule$iso_date[1]
Time difference of 2 days
today() - ymd(r_class_schedule$iso_date[1])
Time difference of 62 days
r_class_schedule$iso_date[1] - 7 * 9
[1] "2022-11-21"
r_class_schedule$iso_date[1] + 7 * 9
[1] "2023-03-27"

Filter dates

(oldsched <- filter(r_class_schedule, iso_date < "2023-01-30") |>
  mutate(week_later = iso_date + 7,
         days_since = today() - iso_date))
# A tibble: 3 × 6
  meeting date        topic               iso_date   week_later days_since
    <int> <chr>       <chr>               <date>     <date>     <drtn>    
1       1 23 Jan 2023 Course introduction 2023-01-23 2023-01-30 62 days   
2       2 25 Jan 2023 Working in RStudio  2023-01-25 2023-02-01 60 days   
3       3 27 Jan 2023 Coding basics       2023-01-27 2023-02-03 58 days   

Solving the problem

What code generates data2 from data1?

data1
# A tibble: 12 × 2
   test_date  birth_date 
   <date>     <chr>      
 1 2023-01-02 1997-07-14 
 2 2023-01-02 1998-01-28 
 3 2023-01-05 1967-07-23 
 4 2023-01-05 Jan 9, 1960
 5 2023-01-08 1950-11-09 
 6 2023-01-14 2001-08-24 
 7 2023-01-16 1979-09-23 
 8 2023-01-23 1970-03-22 
 9 2023-01-26 1957-04-21 
10 2023-01-27 1989-03-07 
11 2023-01-27 1983-11-03 
12 2023-01-28 1989-01-31 
data2
# A tibble: 9 × 4
  test_date  birth_date age_at_testing day_of_birth
  <date>     <date>     <drtn>         <ord>       
1 2023-01-05 1967-07-23 20255 days     Sunday      
2 2023-01-05 1960-01-09 23007 days     Saturday    
3 2023-01-08 1950-11-09 26358 days     Thursday    
4 2023-01-16 1979-09-23 15821 days     Sunday      
5 2023-01-23 1970-03-22 19300 days     Sunday      
6 2023-01-26 1957-04-21 24021 days     Sunday      
7 2023-01-27 1989-03-07 12379 days     Tuesday     
8 2023-01-27 1983-11-03 14330 days     Thursday    
9 2023-01-28 1989-01-31 12415 days     Tuesday     

Let’s code!

Dates and times [Rmd]