Linear Model
We are often interested in trying to determine the relationship
between a pair of numerical variables. For example,
- Advertising spending and the product sales.
- Hours of study and the test scores.
One of the two variables is called the
input/explanatory/independent variable and
the other is called the response/dependent
variable.
Example 1. Come up with a couple of
bivariate examples. Identify the explanatory and the response
variables.
The relationship between the explanatory and the response variables
can be depicted by a scatter diagram.
Example 2. Is there a relation between
one’s credit ratings and one’s credit cards limit?
a <- read.csv("https://albums.yuanting.lu/sta126/data/credit.csv")
plot(a$Limit ~ a$Rating, xlab = "Credit Rating", ylab = "Credit Limits")

Example 3. Is there a relation between
one’s credit ratings and one’s income?
Example 4. Is there a relation between
one’s credit ratings and one’s age?
Linear Correlation Coefficient
\[r = \dfrac{1}{n-1}\sum \left(\dfrac{x_i
- \bar x}{s_x}\right)\left(\dfrac{y_i - \bar y}{s_y}\right)\]
Given the raw data in \(x\) and
\(y\), the \(R\) command for the linear correlation
coefficient is
Properties of the linear correlation coefficient \(r\):
- \(-1\le r \le 1\).
- When \(r\) is close to \(1\), there is strong positive linear
correlation.
- When \(r\) is close to \(-1\), there is strong negative linear
correlation.
- When \(r\) is close to 0, there is
little to no linear correlation.
- \(r\) has no unit, i.e., it is a
unitless measure of association.
Example 5. Find the linear correlation
coefficient in the first three examples. Which pair has the strongest
linear correlation?
Correlation vs. Causation
In observational studies, we cannot establish a causal relationship
with two correlated variables. Check out some of the spurious correlations.
Regression Line
\[\widehat{y} = b_1 x + b_0\]
Given the raw data \(x\) and \(y\), the least-squares linear regression
line, can be found by the \(R\)
command
Example 6. Find the linear regression lines
between credit limits and credit ratings. What is the expected limit for
someone whoes credit rating is 900?
##
## Call:
## lm(formula = a$Limit ~ a$Rating)
##
## Coefficients:
## (Intercept) a$Rating
## -542.93 14.87
Example 7. One of the built-in datasets in
\(R\) is trees.
- Type
trees in the \(R\) console to see the data.
- Type
?trees to see the details information about the
data.
Questions:
- Is there a correlation between the volume and the diameter of the
trees?
- Find the least-squares linear regression line.
- Interpret the slope and the \(y\)-intercept in the regression line.
- Predict the mean volume of a tree that has a diameter of 16.5
inches.
Be aware of the extrapolation issue. Extrapolation
is the process of predicting unknown values by extending existing data
beyond its original range, assuming the observed trends will continue
outside the initial range.
Example 8. Least-squares explained.
x <- c(3, 5, 7, 9, 11)
y <- c(0, 2, 3, 6, 9)