5.1 Problem Formulation

In regression, we model the relationship between inputs \(\mathbf{x} \in \mathbb{R}^D\) and outputs \(y \in \mathbb{R}\) in the presence of observation noise. We assume that each observation follows a probabilistic model: \[ p(y | \mathbf{x}) = \mathcal{N}(y \mid f(\mathbf{\mathbf{x}}), \sigma^2). \] This means the data are generated by: \[ y = f(\mathbf{x}) + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2), \] where \(\epsilon\) is i.i.d. Gaussian noise with zero mean and variance \(\sigma^2\).

The goal is to find a function \(f(\mathbf{x})\) that:

  • Approimates the true (unknown) data-generating function,
  • Generalizes well to unseen data.

5.1.1 Parametric Models

We restrict our attention to parametric models, where the function depends on a set of parameters \(\theta\). In linear regression the model is linear in the parameters: \[ p(y | \mathbf{x}, \theta) = \mathcal{N}(y \mid \mathbf{x}^T \theta, \sigma^2), \] or equivalently, \[ y = \mathbf{x}^T \theta + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2) \] Here:

  • \(\theta \in \mathbb{R}^D\) represents the model parameters,
  • The likelihood epresses how probable a given \(y\) is for known \(\mathbf{x}\) and \(\theta\),
  • Without noise (\(\sigma^2 \to 0\)), the model becomes deterministic (a Dirac delta).

Example 5.1 Linear Regression Model

For \(\mathbf{x}, \theta \in \mathbb{R}\):

  • The model \(y = \theta \mathbf{x}\) describes a straight line through the origin.
  • The slope of the line is given by \(\theta\).
  • Different values of \(\theta\) yield different linear functions.

Although this model is linear in both \(\mathbf{x}\) and \(\theta\), we can generalize it by introducing nonlinear feature transformations: \[ y = \mathbf{\phi}(\mathbf{x})^T \theta. \] Here, \(\mathbf{\phi}(\mathbf{x})\) represents a feature mapping of the input \(\mathbf{x}\).
The model remains linear in the parameters \(\theta\), even if \(\mathbf{\phi}(\mathbf{x})\) is nonlinear in \(\mathbf{x}\).

For now, we assume that the noise variance \(\sigma^2\) is known and focus on learning the optimal parameters \(\theta\).


Exercises

Exercise 5.1 Finding a regression function requires solving a variety of problems. List and discuss them.

Exercise 5.2 Consider the following data: \[(1,3), (2,4), (3,8), (4,9).\] Find the estimated regression line \[y = \beta_0 + \beta_1 x,\] based on the data. For each \(x_i\), compute both the estimated value of \(y\) and the residuals.

Exercise 5.3 A simple linear regression model is fit, relating plant growth over 1 year (y) to amount of fertilizer provided (x). Twenty five plants are selected, 5 each assigned to each of the fertilizer levels (12, 15, 18, 21, 24). The results of the model fit are given below:

Regression Coefficients

Model B Std. Error t Sig
Constant 8.624 1.81 4.764 0
\(x\) 0.527 0.098 5.386 0

Can we conclude that there is an association between fertilizer and plant growth at the 0.05 significance level?

Exercise 5.4 A multiple regression model is fit, relating salary (Y) to the following predictor variables: experience (\(X_1\), in years), accounts in charge of (\(X_2\)) and gender (\(X_3\) is 1 if female, 0 if male). The following ANOVA table and output gives the results for fitting the model. Conduct all tests at the 0.05 significance level:

ANOVA Table

df SS MS F P-value
Regression 3 2470.4 823.5 76.9 0
Residual 21 224.7 10.7
Total 24 2695.1

Regression Coefficients

Model B Std. Error t Sig
Constant 39.58 1.89 21.00 0
Experience 3.61 0.36 10.04 0
Accounts -0.28 0.36 -0.79 0.4389
Gender -3.92 1.48 -2.65 0.0149

Test whether salary is associated with any of the predictor variables.