# A tibble: 146 × 3
film critics audience
<chr> <int> <int>
1 Avengers: Age of Ultron 74 86
2 Cinderella 85 80
3 Ant-Man 80 90
4 Do You Believe? 18 84
5 Hot Tub Time Machine 2 14 28
6 The Water Diviner 63 62
7 Irrational Man 42 53
8 Top Five 86 64
9 Shaun the Sheep Movie 99 82
10 Love & Mercy 89 87
# ℹ 136 more rows
Modelling film ratings
What is the relationship between the critics and audience scores for films?
What is your best guess for a film’s audience score if the critics rated it a 73?
Predictor (explanatory variable)
audience
critics
86
74
80
85
90
80
84
18
28
14
62
63
...
...
Outcome (response variable)
audience
critics
86
74
80
85
90
80
84
18
28
14
62
63
...
...
Regression line
Regression line: slope
Regression line: intercept
Correlation
Correlation
Ranges between -1 and 1.
Same sign as the slope.
Models with a single predictor
Regression model
A regression model is a function that describes the relationship between the outcome, \(Y\), and the predictor, \(X\).
The regression line goes through the center of mass point (the coordinates corresponding to average \(X\) and average \(Y\)): \(b_0 = \bar{Y} - b_1~\bar{X}\)
Slope has the same sign as the correlation coefficient: \(b_1 = r \frac{sd_Y}{sd_X}\)
Sum of the residuals is zero: \(\sum_{i = 1}^n \epsilon_i = 0\)
Residuals and \(X\) values are uncorrelated
Interpreting the slope
The slope of the model for predicting audience score from critics score is 0.519. Which of the following is the best interpretation of this value?
For every one point increase in the critics score, the audience score goes up by 0.519 points, on average.
For every one point increase in the critics score, we expect the audience score to be higher by 0.519 points, on average.
For every one point increase in the critics score, the audience score goes up by 0.519 points.
For every one point increase in the audience score, the critics score goes up by 0.519 points, on average.