Iteration

Lecture 13

Dr. Benjamin Soltoff

Cornell University
INFO 2950 - Spring 2024

March 7, 2024

Announcements

Announcements

  • Lab 04
  • Homework 04
  • Project proposals

Atomic vectors

Subsetting vectors with [] and [[]]

x <- c("one", "two", "three", "four", "five")
  • With positive integers
x[c(3, 2, 5)]
## [1] "three" "two"   "five"
  • With negative integers
x[c(-1, -3, -5)]
## [1] "two"  "four"
  • Don’t mix positive and negative
x[c(-1, 1)]
## Error in x[c(-1, 1)]: only 0's may be mixed with negative subscripts

Subset with a logical vector

(x <- c(10, 3, NA, 5, 8, 1, NA))
[1] 10  3 NA  5  8  1 NA
# All non-missing values of x
!is.na(x)
[1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
x[!is.na(x)]
[1] 10  3  5  8  1
# All even (or missing!) values of x
x[x %% 2 == 0]
[1] 10 NA  8 NA

Lists

Lists

x <- list(1, 2, 3)
x
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

Lists: str()

str(x)
List of 3
 $ : num 1
 $ : num 2
 $ : num 3
x_named <- list(a = 1, b = 2, c = 3)
str(x_named)
List of 3
 $ a: num 1
 $ b: num 2
 $ c: num 3

Store a mix of objects

y <- list("a", 1L, 1.5, TRUE)
str(y)
List of 4
 $ : chr "a"
 $ : int 1
 $ : num 1.5
 $ : logi TRUE

Nested lists

z <- list(list(1, 2), list(3, 4))
str(z)
List of 2
 $ :List of 2
  ..$ : num 1
  ..$ : num 2
 $ :List of 2
  ..$ : num 3
  ..$ : num 4

Stealth lists

str(diamonds)
tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
 $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
 $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
 $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
 $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
 $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
 $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
 $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

Iteration

Iteration

df <- tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)
median(df$a)
[1] -0.07983455
median(df$b)
[1] 0.3802926
median(df$c)
[1] -0.6769652
median(df$d)
[1] 0.4901909

Iteration three ways

  1. for loops
  2. map_*() functions
  3. across()

Iteration with for loops

Iteration with for loop

output <- vector(mode = "double", length = ncol(df))
for (i in seq_along(df)) {
  output[[i]] <- median(df[[i]])
}
output
[1] -0.07983455  0.38029264 -0.67696525  0.49019094

Output

output <- vector(mode = "double", length = ncol(df))
vector(mode = "double", length = ncol(df))
[1] 0 0 0 0
vector(mode = "logical", length = ncol(df))
[1] FALSE FALSE FALSE FALSE
vector(mode = "character", length = ncol(df))
[1] "" "" "" ""
vector(mode = "list", length = ncol(df))
[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

Sequence

i in seq_along(df)
seq_along(df)
[1] 1 2 3 4

Body

output[[i]] <- median(df[[i]])

Preallocation

# no preallocation
mpg_no_preall <- tibble()

for(i in 1:100){
  mpg_no_preall <- bind_rows(mpg_no_preall, mpg)
}

# with preallocation using a list
mpg_preall <- vector(mode = "list", length = 100)

for(i in 1:100){
  mpg_preall[[i]] <- mpg
}

mpg_preall <- list_rbind(mpg_preall)

Iteration with map_*() functions

Map functions

  • Why for loops are good
  • Why map() functions may be better
  • Types of map() functions
    • map() makes a list
    • map_lgl() makes a logical vector
    • map_int() makes an integer vector
    • map_dbl() makes a double vector
    • map_chr() makes a character vector

Map functions

map_dbl(.x = df, .f = mean)
          a           b           c           d 
 0.07462564  0.20862196 -0.42455887  0.32204455 
map_dbl(.x = df, .f = median)
          a           b           c           d 
-0.07983455  0.38029264 -0.67696525  0.49019094 
map_dbl(.x = df, .f = sd)
        a         b         c         d 
0.9537841 1.0380734 0.9308092 0.5273024 

Map functions

map_dbl(.x = df, .f = \(x) mean(x, na.rm = TRUE))
          a           b           c           d 
 0.07462564  0.20862196 -0.42455887  0.32204455 
df |>
  map_dbl(.f = \(x) mean(x, na.rm = TRUE))
          a           b           c           d 
 0.07462564  0.20862196 -0.42455887  0.32204455 

Application exercise

ae-11

  • Go to the course GitHub org and find your ae-11 (repo name will be suffixed with your GitHub name).
  • Clone the repo in RStudio Workbench, open the R script in the repo, and follow along and complete the exercises.
  • Render, commit, and push your edits by the AE deadline – end of tomorrow

Recap

  • Use [], [[]], and $ notation to extract elements from an atomic vector or list object
  • for loops + map() functions are common methods for iteration in R
  • When using for loops, always preallocate the output vector
  • map() functions are a family of functions that apply a function to each element of a vector or list

They’re gonna finish it!