Bechdel

Author

Benjamin Soltoff

Modified

March 6, 2024

In this mini analysis we work with the data used in the FiveThirtyEight story titled “The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women”. Your task is to fill in the blanks denoted by ___.

Data and packages

We start with loading the packages we’ll use.

library(fivethirtyeight)
library(tidyverse)

The dataset contains information on 1794 movies released between 1970 and 2013. However we’ll focus our analysis on movies released between 1990 and 2013.

bechdel90_13 <- bechdel |> 
  filter(between(year, 1990, 2013))

There are 1615 such movies.

The financial variables we’ll focus on are the following:

  • budget_2013: Budget in 2013 inflation adjusted dollars
  • domgross_2013: Domestic gross (US) in 2013 inflation adjusted dollars
  • intgross_2013: Total International (i.e., worldwide) gross in 2013 inflation adjusted dollars

And we’ll also use the binary and clean_test variables for grouping.

Analysis

Let’s take a look at how median budget and gross vary by whether the movie passed the Bechdel test, which is stored in the binary variable.

bechdel90_13 |>
  group_by(binary) |>
  summarize(
    med_budget = median(budget_2013),
    med_domgross = median(domgross_2013, na.rm = TRUE),
    med_intgross = median(intgross_2013, na.rm = TRUE)
    )
# A tibble: 2 × 4
  binary med_budget med_domgross med_intgross
  <chr>       <dbl>        <dbl>        <dbl>
1 FAIL    48385984.    57318606.    104475669
2 PASS    31070724     45330446.     80124349

Next, let’s take a look at how median budget and gross vary by a more detailed indicator of the Bechdel test result. This information is stored in the clean_test variable, which takes on the following values:

  • ok = passes test
  • dubious
  • men = women only talk about men
  • notalk = women don’t talk to each other
  • nowomen = fewer than two women
bechdel90_13 |>
  group_by(clean_test) |>
  summarize(
    med_budget = median(budget_2013),
    med_domgross = median(domgross_2013, na.rm = TRUE),
    med_intgross = median(intgross_2013, na.rm = TRUE)
    )
# A tibble: 5 × 4
  clean_test med_budget med_domgross med_intgross
  <ord>           <dbl>        <dbl>        <dbl>
1 nowomen     43373066     44891296.    89509349 
2 notalk      56570084.    63890455    123102194 
3 men         39737690.    56392786     99578022.
4 dubious     35790994     49173429     89883201 
5 ok          31070724     45330446.    80124349 

In order to evaluate how return on investment varies among movies that pass and fail the Bechdel test, we’ll first create a new variable called roi as the ratio of the gross to budget.

bechdel90_13 <- bechdel90_13 |>
  mutate(roi = (intgross_2013 + domgross_2013) / budget_2013)

Let’s see which movies have the highest return on investment.

bechdel90_13 |>
  arrange(desc(roi)) |> 
  select(title, roi, year)
# A tibble: 1,615 × 3
   title                     roi  year
   <chr>                   <dbl> <int>
 1 Paranormal Activity      671.  2007
 2 The Blair Witch Project  648.  1999
 3 El Mariachi              583.  1992
 4 Clerks.                  258.  1994
 5 In the Company of Men    231.  1997
 6 Napoleon Dynamite        227.  2004
 7 Once                     190.  2006
 8 The Devil Inside         155.  2012
 9 Primer                   142.  2004
10 Fireproof                134.  2008
# ℹ 1,605 more rows

Below is a visualization of the return on investment by test result, however it’s difficult to see the distributions due to a few extreme observations.

ggplot(data = bechdel90_13, 
       mapping = aes(x = clean_test, y = roi, color = binary)) +
  geom_boxplot() +
  labs(
    title = "Return on investment vs. Bechdel test result",
    x = "Detailed Bechdel result",
    y = "Return on investment",
    color = "Binary Bechdel result"
    )

What are those movies with very high returns on investment?

bechdel90_13 |>
  filter(roi > 400) |>
  select(title, budget_2013, domgross_2013, year)
# A tibble: 3 × 4
  title                   budget_2013 domgross_2013  year
  <chr>                         <int>         <dbl> <int>
1 Paranormal Activity          505595     121251476  2007
2 The Blair Witch Project      839077     196538593  1999
3 El Mariachi                   11622       3388636  1992

Zooming in on the movies with roi < 15 provides a better view of how the medians across the categories compare:

ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
  geom_boxplot() +
  labs(
    title = "Return on investment vs. Bechdel test result",
    subtitle = "Only films with lower than a 15 ratio of gross-to-budget",
    x = "Detailed Bechdel result",
    y = "Return on investment",
    color = "Binary Bechdel result"
    ) +
  coord_cartesian(ylim = c(0, 15))

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS Ventura 13.5.2
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-01-29
 pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package         * version date (UTC) lib source
 cli               3.6.2   2023-12-11 [1] CRAN (R 4.3.1)
 colorspace        2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
 digest            0.6.34  2024-01-11 [1] CRAN (R 4.3.1)
 dplyr           * 1.1.4   2023-11-17 [1] CRAN (R 4.3.1)
 evaluate          0.23    2023-11-01 [1] CRAN (R 4.3.1)
 fansi             1.0.6   2023-12-08 [1] CRAN (R 4.3.1)
 farver            2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
 fastmap           1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
 fivethirtyeight * 0.6.2   2021-10-07 [1] CRAN (R 4.3.0)
 forcats         * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
 generics          0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
 ggplot2         * 3.4.4   2023-10-12 [1] CRAN (R 4.3.1)
 glue              1.7.0   2024-01-09 [1] CRAN (R 4.3.1)
 gtable            0.3.4   2023-08-21 [1] CRAN (R 4.3.0)
 here              1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
 hms               1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
 htmltools         0.5.7   2023-11-03 [1] CRAN (R 4.3.1)
 htmlwidgets       1.6.4   2023-12-06 [1] CRAN (R 4.3.1)
 jsonlite          1.8.8   2023-12-04 [1] CRAN (R 4.3.1)
 knitr             1.45    2023-10-30 [1] CRAN (R 4.3.1)
 labeling          0.4.3   2023-08-29 [1] CRAN (R 4.3.0)
 lifecycle         1.0.4   2023-11-07 [1] CRAN (R 4.3.1)
 lubridate       * 1.9.3   2023-09-27 [1] CRAN (R 4.3.1)
 magrittr          2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
 munsell           0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
 pillar            1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig         2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
 purrr           * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
 R6                2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
 readr           * 2.1.5   2024-01-10 [1] CRAN (R 4.3.1)
 rlang             1.1.3   2024-01-10 [1] CRAN (R 4.3.1)
 rmarkdown         2.25    2023-09-18 [1] CRAN (R 4.3.1)
 rprojroot         2.0.4   2023-11-05 [1] CRAN (R 4.3.1)
 rstudioapi        0.15.0  2023-07-07 [1] CRAN (R 4.3.0)
 scales            1.2.1   2024-01-18 [1] Github (r-lib/scales@c8eb772)
 sessioninfo       1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
 stringi           1.8.3   2023-12-11 [1] CRAN (R 4.3.1)
 stringr         * 1.5.1   2023-11-14 [1] CRAN (R 4.3.1)
 tibble          * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 tidyr           * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
 tidyselect        1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
 tidyverse       * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
 timechange        0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
 tzdb              0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
 utf8              1.2.4   2023-10-22 [1] CRAN (R 4.3.1)
 vctrs             0.6.5   2023-12-01 [1] CRAN (R 4.3.1)
 withr             2.5.2   2023-10-30 [1] CRAN (R 4.3.1)
 xfun              0.41    2023-11-01 [1] CRAN (R 4.3.1)
 yaml              2.3.8   2023-12-11 [1] CRAN (R 4.3.1)

 [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────