5.1 Problem Formulation
In regression, we model the relationship between inputs \(\mathbf{x} \in \mathbb{R}^D\) and outputs \(y \in \mathbb{R}\) in the presence of observation noise. We assume that each observation follows a probabilistic model: \[ p(y | \mathbf{x}) = \mathcal{N}(y \mid f(\mathbf{\mathbf{x}}), \sigma^2). \] This means the data are generated by: \[ y = f(\mathbf{x}) + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2), \] where \(\epsilon\) is i.i.d. Gaussian noise with zero mean and variance \(\sigma^2\).
The goal is to find a function \(f(\mathbf{x})\) that:
- Approimates the true (unknown) data-generating function,
- Generalizes well to unseen data.
5.1.1 Parametric Models
We restrict our attention to parametric models, where the function depends on a set of parameters \(\theta\). In linear regression the model is linear in the parameters: \[ p(y | \mathbf{x}, \theta) = \mathcal{N}(y \mid \mathbf{x}^T \theta, \sigma^2), \] or equivalently, \[ y = \mathbf{x}^T \theta + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2) \] Here:
- \(\theta \in \mathbb{R}^D\) represents the model parameters,
- The likelihood epresses how probable a given \(y\) is for known \(\mathbf{x}\) and \(\theta\),
- Without noise (\(\sigma^2 \to 0\)), the model becomes deterministic (a Dirac delta).
Example 5.1 Linear Regression Model
For \(\mathbf{x}, \theta \in \mathbb{R}\):
- The model \(y = \theta \mathbf{x}\) describes a straight line through the origin.
- The slope of the line is given by \(\theta\).
- Different values of \(\theta\) yield different linear functions.
Although this model is linear in both \(\mathbf{x}\) and \(\theta\), we can generalize it by introducing nonlinear feature transformations:
\[
y = \mathbf{\phi}(\mathbf{x})^T \theta.
\]
Here, \(\mathbf{\phi}(\mathbf{x})\) represents a feature mapping of the input \(\mathbf{x}\).
The model remains linear in the parameters \(\theta\), even if \(\mathbf{\phi}(\mathbf{x})\) is nonlinear in \(\mathbf{x}\).
For now, we assume that the noise variance \(\sigma^2\) is known and focus on learning the optimal parameters \(\theta\).
Exercises
Exercise 5.1 Finding a regression function requires solving a variety of problems. List and discuss them.
Exercise 5.2 Consider the following data: \[(1,3), (2,4), (3,8), (4,9).\] Find the estimated regression line \[y = \beta_0 + \beta_1 x,\] based on the data. For each \(x_i\), compute both the estimated value of \(y\) and the residuals.
Exercise 5.3 A simple linear regression model is fit, relating plant growth over 1 year (y) to amount of fertilizer provided (x). Twenty five plants are selected, 5 each assigned to each of the fertilizer levels (12, 15, 18, 21, 24). The results of the model fit are given below:
Regression Coefficients
| Model | B | Std. Error | t | Sig |
|---|---|---|---|---|
| Constant | 8.624 | 1.81 | 4.764 | 0 |
| \(x\) | 0.527 | 0.098 | 5.386 | 0 |
Can we conclude that there is an association between fertilizer and plant growth at the 0.05 significance level?
Exercise 5.4 A multiple regression model is fit, relating salary (Y) to the following predictor variables: experience (\(X_1\), in years), accounts in charge of (\(X_2\)) and gender (\(X_3\) is 1 if female, 0 if male). The following ANOVA table and output gives the results for fitting the model. Conduct all tests at the 0.05 significance level:
ANOVA Table
| df | SS | MS | F | P-value | |
|---|---|---|---|---|---|
| Regression | 3 | 2470.4 | 823.5 | 76.9 | 0 |
| Residual | 21 | 224.7 | 10.7 | ||
| Total | 24 | 2695.1 |
Regression Coefficients
| Model | B | Std. Error | t | Sig |
|---|---|---|---|---|
| Constant | 39.58 | 1.89 | 21.00 | 0 |
| Experience | 3.61 | 0.36 | 10.04 | 0 |
| Accounts | -0.28 | 0.36 | -0.79 | 0.4389 |
| Gender | -3.92 | 1.48 | -2.65 | 0.0149 |
Test whether salary is associated with any of the predictor variables.