Palmer Penguins and regression with multiple predictors

Application exercise
Modified

March 13, 2024

In this application exercise we will continue to study penguins. The data can be found in the palmerpenguins package and we will use tidyverse and tidymodels for data exploration and modeling, respectively.

library(tidyverse)
library(tidymodels)
library(palmerpenguins)
library(skimr)

Please read the following context and take a skim of the data set before we get started.

This data set comprising various measurements of three different penguin species, namely Adelie, Gentoo, and Chinstrap. The rigorous study was conducted in the islands of the Palmer Archipelago, Antarctica. These data were collected from 2007 to 2009 by Dr. Kristen Gorman with the Palmer Station Long Term Ecological Research Program, part of the US Long Term Ecological Research Network. The data set is called penguins.

skim(penguins)

Our goal is to understand better how various body measurements and attributes of penguins relate to their body mass.

Body mass vs. flipper length

The regression model for body mass vs. flipper length is as follows.

bm_fl_fit <- linear_reg() |>
  fit(body_mass_g ~ flipper_length_mm, data = penguins)

tidy(bm_fl_fit)

And this is the regression model for body mass vs. island.

bm_island_fit <- linear_reg() |>
  fit(body_mass_g ~ island, data = penguins)

tidy(bm_island_fit)

Body mass vs. flipper length (transformed)

  • Your turn: Run the code chunk below and create two separate plots. How are the two plots different than each other? Which plot’s model appears to fit the data better?
# Plot A
ggplot(data = penguins, 
       mapping = aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm") +
  labs(title = "Plot A - Linear model")

# Plot B
ggplot(data = penguins, 
       mapping = aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(alpha = 0.5) +
  geom_smooth(method = "lm", formula = y ~ poly(x, degree = 2, raw = TRUE)) +
  labs(title = "Plot B - Polynomial model")

Add response here.

Fitting a model with transformed variables

Demo: Fit the polynomial model from above.

bm_fl_poly_fit <- linear_reg() |>
  fit(body_mass_g ~ flipper_length_mm + TODO, data = penguins)
tidy(bm_fl_poly_fit)
  • Your turn: Interpret the slope coefficients for flipper length in the context of the data and the research question.

    Add response here.

  • Demo: Predict the body mass of penguins with flipper lengths of 180, 190, 200, and 210 mm.

penguin_flippers <- tibble(
  flipper_length_mm = c(180, 190, 200, 210)
)

predict(bm_fl_poly_fit, new_data = penguin_flippers) |>
  # difference in predicted value compared to previous row
  mutate(diff = .pred - lag(.pred))

Body mass vs. flipper length and island

Next, we will expand our understanding of models by continuing to learn about penguins. So far, we modeled body mass by flipper length, and in a separate model, modeled body mass by island. Could it be possible that the estimated body mass of a penguin changes by both their flipper length AND by the island they are on?

  • Demo: Fit a model to predict body mass from flipper length and island. Display the summary output and write out the estimate regression equation below.
bm_fl_island_fit <- linear_reg() |>
  fit(body_mass_g ~ flipper_length_mm + island, data = penguins)

tidy(bm_fl_island_fit)

\[ \widehat{body~mass} = TODO \]

  • Your turn: Interpret the slope coefficient for flipper length in the context of the data and the research question.

    Add response here.

  • Demo: Predict the body mass of a Dream island penguin with a flipper length of 200 mm.

penguin_200_Dream <- tibble(
  flipper_length_mm = 200,
  island = "Dream"
)

predict(bm_fl_island_fit, new_data = penguin_200_Dream)

Additive vs. interaction models

  • Review: What assumption does the additive model make about the slopes between flipper length and body mass for each of the three islands?

    Add response here.

  • Demo: Now fit an interaction model of body mass vs. flipper length and island.

bm_fl_island_int_fit <- linear_reg() |>
  fit(body_mass_g ~ TODO, data = penguins)

tidy(bm_fl_island_int_fit)
  • Review: What does modeling body mass with an interaction effect get us that without doing so does not?

    Add response here.

  • Your turn: Predict the body mass of a Dream island penguin with a flipper length of 200 mm.

predict(bm_fl_island_int_fit, new_data = penguin_200_Dream)

Choosing a model

Rule of thumb: Occam’s Razor - Don’t overcomplicate the situation! We prefer the simplest best model.

glance(bm_fl_fit)
glance(bm_fl_poly_fit)
glance(bm_fl_island_fit)
glance(bm_fl_island_int_fit)

Your turn: Which model do you believe is most appropriate? Why?

Add response here.