02:00
Lecture 20
Cornell University
INFO 2950 - Spring 2024
April 9, 2024
Test a claim about a population parameter
Use simulation-based methods to generate the null distribution
Calculate and interpret the p-value
Use the p-value to draw conclusions in the context of the data and the research question
A hypothesis test is a statistical technique used to evaluate competing claims using data
Null hypothesis \(H_0\): An assumption about the population. “There is nothing going on.”
Alternative hypothesis, \(H_A\): A research question about the population. “There is something going on”.
Note: Hypotheses are always at the population level!
Related, but have distinct motivations
As a researcher, you are interested in the average number of marijuana gummies consumed by Cornell students each week. An article on The Cornell Sun suggests that Cornell students consume, on average, 1.5 gummies each week. You are interested in evaluating if The Cornell Sun’s claim is too high. What are your hypotheses?
02:00
As a researcher, you are interested in the average number of marijuana gummies Cornell students consume in a week.
An article on The Cornell Sun suggests that Cornell students consume, on average, 1.5 gummies each week. \(\rightarrow H_0: \mu = 1.5\)
You are interested in evaluating if The Cornell Sun’s estimate is too high. \(\rightarrow H_A: \mu < 1.5\)
Let’s suppose you manage to take a random sample of 1000 Cornell students and ask them how many gummies they consume and calculate the sample average to be \(\bar{x} = 1\).
Assume you live in a world where the null hypothesis is true: \(\mu = 1.5\).
Ask yourself how likely you are to observe the sample statistic, or something even more extreme, in this world: \(\Pr(\bar{x} < 1 | \mu = 1.5)\) = ?
An article on The Cornell Sun suggests that Cornell students consume, on average, 1.5 gummies each week. \(H_0: \mu = 1.5\)
Is The Cornell Sun’s estimate correct? (two sided) \(H_A: \mu \neq 1.5\)
Is The Cornell Sun’s estimate too high? (one sided) \(H_A: \mu < 1.5\)
Do you think yawning is contagious?
An experiment conducted by the MythBusters tested if a person can be subconsciously influenced into yawning if another person near them yawns.
In this study 50 people were randomly assigned to two groups: 34 to a group where a person near them yawned (treatment) and 16 to a control group where they didn’t see someone yawn (control).
The data are in the openintro package: yawn
# A tibble: 4 × 4
# Groups: group [2]
group result n p_hat
<fct> <fct> <int> <dbl>
1 ctrl not yawn 12 0.75
2 ctrl yawn 4 0.25
3 trmt not yawn 24 0.706
4 trmt yawn 10 0.294
Based on the proportions we calculated, do you think yawning is really contagious, i.e. are seeing someone yawn and yawning dependent?
# A tibble: 4 × 4
# Groups: group [2]
group result n p_hat
<fct> <fct> <int> <dbl>
1 ctrl not yawn 12 0.75
2 ctrl yawn 4 0.25
3 trmt not yawn 24 0.706
4 trmt yawn 10 0.294
The observed differences might suggest that yawning is contagious, i.e. seeing someone yawn and yawning are dependent.
But the differences are small enough that we might wonder if they might simply be due to chance.
Perhaps if we were to repeat the experiment, we would see slightly different results.
So we will do just that - well, somewhat - and see what happens.
Instead of actually conducting the experiment many times, we will simulate our results.
Null hypothesis: “There is nothing going on.”
Yawning and seeing someone yawn are independent, yawning is not contagious, observed difference in proportions is simply due to chance.
\[H_0: p_{\text{treatment}} - p_{\text{control}} = 0\]
Alternative hypothesis: “There is something going on.” Yawning and seeing someone yawn are dependent, yawning is contagious, observed difference in proportions is not due to chance.
\[H_a: p_{\text{treatment}} - p_{\text{control}} \neq 0\]
What would you expect the center of the null distribution to be?
What is the p-value, i.e. in what % of the simulations was the simulated difference in sample proportion at least as extreme as the observed difference in sample proportions?
We often use 5% as the cutoff for whether the p-value is low enough that the data are unlikely to have come from the null model. This cutoff value is called the significance level, \(\alpha\).
If p-value < \(\alpha\), reject \(H_0\) in favor of \(H_A\): The data provide convincing evidence for the alternative hypothesis.
If p-value > \(\alpha\), fail to reject \(H_0\) in favor of \(H_A\): The data do not provide convincing evidence for the alternative hypothesis.
The p-value should be set in advance based on your concern of making a false positive (reject the null hypothesis when in fact it is correct) vs. false negative (fail to reject the null hypothesis when in fact it is incorrect).
Scientists are typically most concerned with avoiding false positives and set a significance level of \(\alpha = 0.05\).
What is the conclusion of the hypothesis test?
Do you “buy” this conclusion?
[OC] If You Order Chipotle Online, You Are Probably Getting Less Food
byu/G_NC indataisbeautiful
ae-18
ae-18
(repo name will be suffixed with your GitHub name).