library(fivethirtyeight)
library(tidyverse)
Bechdel
In this mini analysis we work with the data used in the FiveThirtyEight story titled “The Dollar-And-Cents Case Against Hollywood’s Exclusion of Women”. Your task is to fill in the blanks denoted by ___
.
Data and packages
We start with loading the packages we’ll use.
The dataset contains information on 1794 movies released between 1970 and 2013. However we’ll focus our analysis on movies released between 1990 and 2013.
<- bechdel |>
bechdel90_13 filter(between(year, 1990, 2013))
There are 1615 such movies.
The financial variables we’ll focus on are the following:
budget_2013
: Budget in 2013 inflation adjusted dollarsdomgross_2013
: Domestic gross (US) in 2013 inflation adjusted dollarsintgross_2013
: Total International (i.e., worldwide) gross in 2013 inflation adjusted dollars
And we’ll also use the binary
and clean_test
variables for grouping.
Analysis
Let’s take a look at how median budget and gross vary by whether the movie passed the Bechdel test, which is stored in the binary
variable.
|>
bechdel90_13 group_by(binary) |>
summarize(
med_budget = median(budget_2013),
med_domgross = median(domgross_2013, na.rm = TRUE),
med_intgross = median(intgross_2013, na.rm = TRUE)
)
# A tibble: 2 × 4
binary med_budget med_domgross med_intgross
<chr> <dbl> <dbl> <dbl>
1 FAIL 48385984. 57318606. 104475669
2 PASS 31070724 45330446. 80124349
Next, let’s take a look at how median budget and gross vary by a more detailed indicator of the Bechdel test result. This information is stored in the clean_test
variable, which takes on the following values:
ok
= passes testdubious
men
= women only talk about mennotalk
= women don’t talk to each othernowomen
= fewer than two women
|>
bechdel90_13 group_by(clean_test) |>
summarize(
med_budget = median(budget_2013),
med_domgross = median(domgross_2013, na.rm = TRUE),
med_intgross = median(intgross_2013, na.rm = TRUE)
)
# A tibble: 5 × 4
clean_test med_budget med_domgross med_intgross
<ord> <dbl> <dbl> <dbl>
1 nowomen 43373066 44891296. 89509349
2 notalk 56570084. 63890455 123102194
3 men 39737690. 56392786 99578022.
4 dubious 35790994 49173429 89883201
5 ok 31070724 45330446. 80124349
In order to evaluate how return on investment varies among movies that pass and fail the Bechdel test, we’ll first create a new variable called roi
as the ratio of the gross to budget.
<- bechdel90_13 |>
bechdel90_13 mutate(roi = (intgross_2013 + domgross_2013) / budget_2013)
Let’s see which movies have the highest return on investment.
|>
bechdel90_13 arrange(desc(roi)) |>
select(title, roi, year)
# A tibble: 1,615 × 3
title roi year
<chr> <dbl> <int>
1 Paranormal Activity 671. 2007
2 The Blair Witch Project 648. 1999
3 El Mariachi 583. 1992
4 Clerks. 258. 1994
5 In the Company of Men 231. 1997
6 Napoleon Dynamite 227. 2004
7 Once 190. 2006
8 The Devil Inside 155. 2012
9 Primer 142. 2004
10 Fireproof 134. 2008
# ℹ 1,605 more rows
Below is a visualization of the return on investment by test result, however it’s difficult to see the distributions due to a few extreme observations.
ggplot(data = bechdel90_13,
mapping = aes(x = clean_test, y = roi, color = binary)) +
geom_boxplot() +
labs(
title = "Return on investment vs. Bechdel test result",
x = "Detailed Bechdel result",
y = "Return on investment",
color = "Binary Bechdel result"
)
What are those movies with very high returns on investment?
|>
bechdel90_13 filter(roi > 400) |>
select(title, budget_2013, domgross_2013, year)
# A tibble: 3 × 4
title budget_2013 domgross_2013 year
<chr> <int> <dbl> <int>
1 Paranormal Activity 505595 121251476 2007
2 The Blair Witch Project 839077 196538593 1999
3 El Mariachi 11622 3388636 1992
Zooming in on the movies with roi < 15
provides a better view of how the medians across the categories compare:
ggplot(data = bechdel90_13, mapping = aes(x = clean_test, y = roi, color = binary)) +
geom_boxplot() +
labs(
title = "Return on investment vs. Bechdel test result",
subtitle = "Only films with lower than a 15 ratio of gross-to-budget",
x = "Detailed Bechdel result",
y = "Return on investment",
color = "Binary Bechdel result"
+
) coord_cartesian(ylim = c(0, 15))
::session_info() sessioninfo
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 (2023-10-31)
os macOS Ventura 13.5.2
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2024-01-29
pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
cli 3.6.2 2023-12-11 [1] CRAN (R 4.3.1)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
digest 0.6.34 2024-01-11 [1] CRAN (R 4.3.1)
dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.1)
evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1)
fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.1)
farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
fivethirtyeight * 0.6.2 2021-10-07 [1] CRAN (R 4.3.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.4.4 2023-10-12 [1] CRAN (R 4.3.1)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.1)
gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0)
here 1.0.1 2020-12-13 [1] CRAN (R 4.3.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.1)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.1)
knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1)
labeling 0.4.3 2023-08-29 [1] CRAN (R 4.3.0)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1)
lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.1)
rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.1)
rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1)
rprojroot 2.0.4 2023-11-05 [1] CRAN (R 4.3.1)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0)
scales 1.2.1 2024-01-18 [1] Github (r-lib/scales@c8eb772)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.1)
stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.1)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1)
withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.1)
xfun 0.41 2023-11-01 [1] CRAN (R 4.3.1)
yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1)
[1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────