In the previous two sections, we worked with categorical variables (e.g., in favor/not in favor, smoker/nonsmoker, defective component/good component), for which we learned how to
Now, we turn to numerical variables. Our objectives remain the same.
Let’s start with a review of the notations.
qnorm(0.99), because this is the 99th percentile
(i.e., only 1% data is above this number). As we have learned before,
\(z_{0.01}\) is the \(z\)-score we need for a 98% confidence
interval, \(z_{0.025}\) is the \(z\)-score we need for a 95% confidence
interval, and so on.
If the population standard deviation, \(\sigma\), is unknown, a confidence interval for a population mean \(\mu\) is \[\left(\overline{x} - t_{\alpha}\dfrac{s}{\sqrt n}, \quad\overline{x} + t_{\alpha}\dfrac{s}{\sqrt n}\right).\]
What differences have you noticed between the two formulas (known \(\sigma\) vs unknown \(\sigma\))?
When the population standard deviation \(\sigma\) is unknown, we use the sample standard deviation \(s\) to replace it in the formula.
As a result, we need to adjust the probability model from a standard normal distribution (\(z_\alpha\)) to t-distribution (\(t_{\alpha}\)). Below is a comparison of the standard normal distribution (red curve) with a t-distribution whose degree of freedom is 10 (df=10, black curve).
T-distribution has a parameter called the
The table below compares the R commands for the t-distribution with the standard normal distribution. As you can see the only difference is that we have to plug in a degree of freedom (df) for the t-distribution.
| T-distribution | Standard Normal Distribution | |
|---|---|---|
| Probability | pt(t, df) |
pnorm(z) |
| Percentile | qt(p, df) |
qnorm(p) |
Use qt(0.99, 10) to see \(t_{0.01}=2.764\). (We denote it by \(t_{0.01}\) because there is 1% data above
this number in the distribution.)
qt(0.975, 19) when the degree of freedom is 19 (sample size
minus 1).Same as (a), except \(t_{0.005}=2.861\)
qt(0.995, 19) when the degree of freedom is 19 (sample size
minus 1).
We are 99% confident that the true amount of PCB in the milk of nursing mothers is between 2.55 ppm and 9.05 ppm. We lose accuracy as we increase the confidence level.