Statistical inference is the science of drawing conclusions about a population based on information contained in a sample.

A particular type of inference is involved with the testing of hypotheses concerning some of the parameters of the population distribution. These hypotheses will usually specify that a population parameter, such as the population proportion, has a certain value. We must then decide whether this hypothesis is consistent with data obtained in a sample.

Fair Cards?

Let’s consider three experiments with three decks of cards (52 cards per deck).

  1. Load all three decks of cards.

    cards <- read.csv("https://albums.yuanting.lu/sta126/data/cards.csv")
  2. Repeat the command ten times to draw (with replacement) a sample of 10 cards from the first deck.

    sample(cards$deck1, 1)
  3. How many red cards do you get? What is the proportion of red cards?

  4. Do you think this is a fair deck?

  5. Repeat the process for the second deck of cards. What proportion of red cards do you get in a sample of 10 cards (with replacement)? Do you think it is a fair deck?

    sample(cards$deck2, 10, replace = TRUE)
  6. How about the third deck of cards? Do you think it is a fair deck?

    sample(cards$deck3, 10, replace = TRUE)
  7. Finally, let’s reveal the three decks of cards. Did you get all the correct decisions earlier?

    table(cards$deck1)
    table(cards$deck2)
    table(cards$deck3)

Concepts

  • Null hypothesis: the hypothesis to be tested, denoted by \(H_0\).
  • Alternative hypothesis: the hypothesis to be established, denoted by \(H_1\) or \(H_a\).
  • P-value: the (largest) probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct.
  • Type I error: Reject a true null hypothesis.
  • Type II error: Fail to reject a false null hypothesis.

Two-Sided Test

\(H_0: p=p_0\)

\(H_1: p\ne p_0\)


Example 1. If we randomly draw (with replacement) 10 cards from a deck of standard cards, and observe 3 out of 10 are red, then can we conclude that this is not a fair deck of cards?

Let \(p\) denote the true proportion of red cards.

  • \(H_0: p = 0.5\)
  • \(H_1: p \ne 0.5\)
  • P-value:
    • Based on the binomial distribution (\(n=10\) and \(p=0.5\)):

      2 * pbinom(3, 10, 0.5)
      ## [1] 0.34375
    • Based on the normal approximation (i.e., the sampling distribution of \(\widehat p\)), in which \(\scriptsize E[\widehat p]=p=0.5\) and \(\scriptsize SD(\widehat p)=\sqrt{p(1-p)/n}=\sqrt{0.5(1-0.5)/10}\), with continuous correction applied:

      2 * pnorm(3.5 / 10, 0.5, sqrt(0.5 * (1 - 0.5) / 10))
      ## [1] 0.3427817

      If the cards were fair, the probability of getting 3 or fewer red cards, or 7 or more red cards in a sample of 10 cards is about 0.343.

  • Conclusion: Based on a significance level of 0.05, the p-value is not small, meaning it is likely to get 3 or fewer red cards in a sample of 10 cards drawing from a fair deck of cards. Therefore, we do not have sufficient evidence to show that this is not a fair deck of cards.

Did you say we believe the deck of cards is fair? Do not reject \(H_0\) does NOT mean accept \(H_0\). It only means the population proportion assumed in the null hypothesis is merely one of many plausible population proportions.


Example 2 (Ross 9.10) Historical data indicate that 4% of the components produced at a certain manufacturing facility are defective. A particularly acrimonious labor dispute has recently been concluded, and management is curious about whether it will result in any change in this figure of 4 percent. If a random sample of 500 items indicated 16 defectives (3.2%), is this significant evidence, at the 5% level of significance, to conclude that a change has occurred?

Let \(p\) be the true proportion of defective components.

  • \(H_0: p = 0.04\)
  • \(H_1: p \ne 0.04\)
  • P-value:
    • Based on the binomial distribution (\(n=500\) and \(p=0.04\)):

      2 * pbinom(16, 500, 0.04)
      ## [1] 0.4316072
    • Based on the normal approximation (i.e., the sampling distribution of \(\hat p\)), in which \(\scriptsize E[\hat p]=p=0.04\) and \(\scriptsize SD(\hat p)=\sqrt{p(1-p)/n}=\sqrt{0.04(1-0.04)/500}\), with continuous correction applied:

      2 * pnorm(16.5 / 500, 0.04, sqrt(0.04 * (1 - 0.04) / 500))
      ## [1] 0.4244284

      If the true proportion of defective components was 0.04, the probability of getting 16 or fewer defectives, or 24 or more defectives in a sample of 500 units is about 0.424.

  • Conclusion: Based on a significance level of 0.05, the p-value is not small, meaning the data we observed is not rare. Therefore, we do not reject the hypothesis that the true proportion of defective components is 0.04.

One-Sided Tests

\(H_0: p=p_0\)

\(H_1: p> p_0\) or \(p<p_0\)


Example 3 (Ross 9.8) A noted educator claims that over half the adult U.S. population is concerned about the lack of educational programs shown on television. To gather data about this issue, a national polling service randomly chose and questioned 920 individuals. If 478 (52%) of those surveyed stated that they are concerned about the lack of educational programs on television, does this prove the claim of the educator?

Let \(p\) be the true proportion of people who are concerned about the lack of educational programs on television.

  • \(H_0: p = 0.5\)
  • \(H_1: p > 0.5\)
  • P-value:
    • Based on the binomial distribution (\(n=920\) and \(p=0.5\)):

      1 - pbinom(477, 920, 0.5)
      ## [1] 0.1242603
    • Based on the normal approximation (i.e., the sampling distribution of \(\hat p\)), in which \(\scriptsize E[\hat p]=p=0.5\) and \(\scriptsize SD(\hat p)=\sqrt{p(1-p)/n}=\sqrt{0.5(1-0.5)/920}\), with continuous correction applied:

      1 - pnorm(477.5 / 920, 0.5, sqrt(0.5 * (1 - 0.5) / 920))
      ## [1] 0.1242673

      If true proportion of people who are concerned about the lack of educational programs on television was 0.5, the probability of seeing 478 or more concerned people in a sample of 920 is about 0.124.

  • Conclusion: Based on a significance level of 0.05, the p-value is not small, meaning the data we observed is not rare. Therefore, we do not reject the hypothesis that the true proportion of people who are concerned about the lack of educational programs on television is 0.5.

Example 4 (Ross 9.9) A computer chip manufacturer claims that at most 2% of the chips it produces are defective. An electronics company, impressed by that claim, has purchased a large quantity of chips. To determine if the manufacturer’s claim is plausible, the company has decided to test a sample of 400 of these chips. If there are 13 defective chips (3.25 percent) among these 400, does this disprove (at the 5 percent level of significance) the manufacturer’s claim?