Statistical inference is the science of drawing conclusions
about a population based on information contained in a
sample.
A particular type of inference is involved with the testing of
hypotheses concerning some of the parameters of
the population distribution. These hypotheses will usually
specify that a population parameter, such as the population proportion,
has a certain value. We must then decide whether this hypothesis is
consistent with data obtained in a sample.
Fair Cards?
Let’s consider three experiments with three decks of cards (52 cards
per deck).
Load all three decks of cards.
cards <- read.csv("https://albums.yuanting.lu/sta126/data/cards.csv")
Repeat the command ten times to draw (with replacement) a sample
of 10 cards from the first deck.
How many red cards do you get? What is the proportion of red
cards?
Do you think this is a fair deck?
Repeat the process for the second deck of cards. What proportion
of red cards do you get in a sample of 10 cards (with replacement)? Do
you think it is a fair deck?
sample(cards$deck2, 10, replace = TRUE)
How about the third deck of cards? Do you think it is a fair
deck?
sample(cards$deck3, 10, replace = TRUE)
Finally, let’s reveal the three decks of cards. Did you get all
the correct decisions earlier?
table(cards$deck1)
table(cards$deck2)
table(cards$deck3)
Concepts
- Null hypothesis: the hypothesis to be
tested, denoted by \(H_0\).
- Alternative hypothesis: the hypothesis
to be established, denoted by \(H_1\)
or \(H_a\).
- P-value: the (largest) probability of
obtaining test results at least as extreme as the results actually
observed, under the assumption that the null hypothesis is correct.
- Type I error: Reject a true null
hypothesis.
- Type II error: Fail to reject a false
null hypothesis.
Two-Sided Test
\(H_0: p=p_0\)
\(H_1: p\ne p_0\)
Example 1. If we randomly draw (with
replacement) 10 cards from a deck of standard cards, and observe 3 out
of 10 are red, then can we conclude that this is not a fair deck of
cards?
Let \(p\) denote the true proportion
of red cards.
- \(H_0: p = 0.5\)
- \(H_1: p \ne 0.5\)
- P-value:
Based on the binomial distribution (\(n=10\) and \(p=0.5\)):
## [1] 0.34375
Based on the normal approximation (i.e., the sampling
distribution of \(\widehat p\)), in
which \(\scriptsize E[\widehat
p]=p=0.5\) and \(\scriptsize
SD(\widehat p)=\sqrt{p(1-p)/n}=\sqrt{0.5(1-0.5)/10}\), with continuous correction applied:
2 * pnorm(3.5 / 10, 0.5, sqrt(0.5 * (1 - 0.5) / 10))
## [1] 0.3427817
If the cards were fair, the probability of getting 3 or
fewer red cards, or 7 or more red cards in a
sample of 10 cards is about 0.343.
- Conclusion: Based on a significance level of 0.05, the p-value is
not small, meaning it is likely to get 3 or fewer red cards in a sample
of 10 cards drawing from a fair deck of cards. Therefore, we do not have
sufficient evidence to show that this is not a fair deck of cards.
Did you say we believe the deck of cards is fair? Do not reject \(H_0\) does NOT mean accept \(H_0\). It only means the population
proportion assumed in the null hypothesis is merely one of many
plausible population proportions.
Example 2 (Ross 9.10) Historical data
indicate that 4% of the components produced at a certain manufacturing
facility are defective. A particularly acrimonious labor dispute has
recently been concluded, and management is curious about whether it will
result in any change in this figure of 4 percent. If a random sample of
500 items indicated 16 defectives (3.2%), is this significant evidence,
at the 5% level of significance, to conclude that a change has
occurred?
Let \(p\) be the true proportion of
defective components.
- \(H_0: p = 0.04\)
- \(H_1: p \ne 0.04\)
- P-value:
Based on the binomial distribution (\(n=500\) and \(p=0.04\)):
2 * pbinom(16, 500, 0.04)
## [1] 0.4316072
Based on the normal approximation (i.e., the sampling
distribution of \(\hat p\)), in which
\(\scriptsize E[\hat p]=p=0.04\) and
\(\scriptsize SD(\hat
p)=\sqrt{p(1-p)/n}=\sqrt{0.04(1-0.04)/500}\), with continuous correction applied:
2 * pnorm(16.5 / 500, 0.04, sqrt(0.04 * (1 - 0.04) / 500))
## [1] 0.4244284
If the true proportion of defective components was 0.04, the
probability of getting 16 or fewer defectives, or 24
or more defectives in a sample of 500 units is about
0.424.
- Conclusion: Based on a significance level of 0.05, the p-value is
not small, meaning the data we observed is not rare. Therefore, we do
not reject the hypothesis that the true proportion of defective
components is 0.04.
One-Sided Tests
\(H_0: p=p_0\)
\(H_1: p> p_0\) or \(p<p_0\)
Example 3 (Ross 9.8) A noted educator
claims that over half the adult U.S. population is concerned about the
lack of educational programs shown on television. To gather data about
this issue, a national polling service randomly chose and questioned 920
individuals. If 478 (52%) of those surveyed stated that they are
concerned about the lack of educational programs on television, does
this prove the claim of the educator?
Let \(p\) be the true proportion of
people who are concerned about the lack of educational programs on
television.
- \(H_0: p = 0.5\)
- \(H_1: p > 0.5\)
- P-value:
Based on the binomial distribution (\(n=920\) and \(p=0.5\)):
1 - pbinom(477, 920, 0.5)
## [1] 0.1242603
Based on the normal approximation (i.e., the sampling
distribution of \(\hat p\)), in which
\(\scriptsize E[\hat p]=p=0.5\) and
\(\scriptsize SD(\hat
p)=\sqrt{p(1-p)/n}=\sqrt{0.5(1-0.5)/920}\), with continuous correction applied:
1 - pnorm(477.5 / 920, 0.5, sqrt(0.5 * (1 - 0.5) / 920))
## [1] 0.1242673
If true proportion of people who are concerned about the lack of
educational programs on television was 0.5, the probability of seeing
478 or more concerned people in a sample of 920 is
about 0.124.
- Conclusion: Based on a significance level of 0.05, the p-value is
not small, meaning the data we observed is not rare. Therefore, we do
not reject the hypothesis that the true proportion of people who are
concerned about the lack of educational programs on television is
0.5.
Example 4 (Ross 9.9) A computer chip
manufacturer claims that at most 2% of the chips it produces are
defective. An electronics company, impressed by that claim, has
purchased a large quantity of chips. To determine if the manufacturer’s
claim is plausible, the company has decided to test a sample of 400 of
these chips. If there are 13 defective chips (3.25 percent) among these
400, does this disprove (at the 5 percent level of significance) the
manufacturer’s claim?