library(tidyverse)
library(tidymodels)
library(palmerpenguins)
library(skimr)
Palmer Penguins and regression with multiple predictors
In this application exercise we will continue to study penguins. The data can be found in the palmerpenguins package and we will use tidyverse and tidymodels for data exploration and modeling, respectively.
Please read the following context and take a skim
of the data set before we get started.
This data set comprising various measurements of three different penguin species, namely Adelie, Gentoo, and Chinstrap. The rigorous study was conducted in the islands of the Palmer Archipelago, Antarctica. These data were collected from 2007 to 2009 by Dr. Kristen Gorman with the Palmer Station Long Term Ecological Research Program, part of the US Long Term Ecological Research Network. The data set is called
penguins
.
skim(penguins)
Our goal is to understand better how various body measurements and attributes of penguins relate to their body mass.
Body mass vs. flipper length
The regression model for body mass vs. flipper length is as follows.
<- linear_reg() |>
bm_fl_fit fit(body_mass_g ~ flipper_length_mm, data = penguins)
tidy(bm_fl_fit)
And this is the regression model for body mass vs. island.
<- linear_reg() |>
bm_island_fit fit(body_mass_g ~ island, data = penguins)
tidy(bm_island_fit)
Body mass vs. flipper length (transformed)
- Your turn: Run the code chunk below and create two separate plots. How are the two plots different than each other? Which plot’s model appears to fit the data better?
# Plot A
ggplot(data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm") +
labs(title = "Plot A - Linear model")
# Plot B
ggplot(data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", formula = y ~ poly(x, degree = 2, raw = TRUE)) +
labs(title = "Plot B - Polynomial model")
Add response here.
Fitting a model with transformed variables
Demo: Fit the polynomial model from above.
<- linear_reg() |>
bm_fl_poly_fit fit(body_mass_g ~ flipper_length_mm + TODO, data = penguins)
tidy(bm_fl_poly_fit)
Your turn: Interpret the slope coefficients for flipper length in the context of the data and the research question.
Add response here.
Demo: Predict the body mass of penguins with flipper lengths of 180, 190, 200, and 210 mm.
<- tibble(
penguin_flippers flipper_length_mm = c(180, 190, 200, 210)
)
predict(bm_fl_poly_fit, new_data = penguin_flippers) |>
# difference in predicted value compared to previous row
mutate(diff = .pred - lag(.pred))
Body mass vs. flipper length and island
Next, we will expand our understanding of models by continuing to learn about penguins. So far, we modeled body mass by flipper length, and in a separate model, modeled body mass by island. Could it be possible that the estimated body mass of a penguin changes by both their flipper length AND by the island they are on?
- Demo: Fit a model to predict body mass from flipper length and island. Display the summary output and write out the estimate regression equation below.
<- linear_reg() |>
bm_fl_island_fit fit(body_mass_g ~ flipper_length_mm + island, data = penguins)
tidy(bm_fl_island_fit)
\[ \widehat{body~mass} = TODO \]
Your turn: Interpret the slope coefficient for flipper length in the context of the data and the research question.
Add response here.
Demo: Predict the body mass of a Dream island penguin with a flipper length of 200 mm.
<- tibble(
penguin_200_Dream flipper_length_mm = 200,
island = "Dream"
)
predict(bm_fl_island_fit, new_data = penguin_200_Dream)
Additive vs. interaction models
Review: What assumption does the additive model make about the slopes between flipper length and body mass for each of the three islands?
Add response here.
Demo: Now fit an interaction model of body mass vs. flipper length and island.
<- linear_reg() |>
bm_fl_island_int_fit fit(body_mass_g ~ TODO, data = penguins)
tidy(bm_fl_island_int_fit)
Review: What does modeling body mass with an interaction effect get us that without doing so does not?
Add response here.
Your turn: Predict the body mass of a Dream island penguin with a flipper length of 200 mm.
predict(bm_fl_island_int_fit, new_data = penguin_200_Dream)
Choosing a model
Rule of thumb: Occam’s Razor - Don’t overcomplicate the situation! We prefer the simplest best model.
glance(bm_fl_fit)
glance(bm_fl_poly_fit)
glance(bm_fl_island_fit)
glance(bm_fl_island_int_fit)
Your turn: Which model do you believe is most appropriate? Why?
Add response here.