5.1 Problem Formulation

In regression, we model the relationship between inputs \(\mathbf{x} \in \mathbb{R}^D\) and outputs \(y \in \mathbb{R}\) in the presence of observation noise. We assume that each observation follows a probabilistic model: \[ p(y | \mathbf{x}) = \mathcal{N}(y \mid f(\mathbf{\mathbf{x}}), \sigma^2). \] This means the data are generated by: \[ y = f(\mathbf{x}) + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2), \] where \(\epsilon\) is i.i.d. Gaussian noise with zero mean and variance \(\sigma^2\).

The goal is to find a function \(f(\mathbf{x})\) that:

Approimates the true (unknown) data-generating function,
Generalizes well to unseen data.

5.1.1 Parametric Models

We restrict our attention to parametric models, where the function depends on a set of parameters \(\theta\). In linear regression the model is linear in the parameters: \[ p(y | \mathbf{x}, \theta) = \mathcal{N}(y \mid \mathbf{x}^T \theta, \sigma^2), \] or equivalently, \[ y = \mathbf{x}^T \theta + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2) \] Here:

\(\theta \in \mathbb{R}^D\) represents the model parameters,
The likelihood epresses how probable a given \(y\) is for known \(\mathbf{x}\) and \(\theta\),
Without noise (\(\sigma^2 \to 0\)), the model becomes deterministic (a Dirac delta).

Example 5.1 Linear Regression Model

For \(\mathbf{x}, \theta \in \mathbb{R}\):

The model \(y = \theta \mathbf{x}\) describes a straight line through the origin.
The slope of the line is given by \(\theta\).
Different values of \(\theta\) yield different linear functions.

Although this model is linear in both \(\mathbf{x}\) and \(\theta\), we can generalize it by introducing nonlinear feature transformations: \[ y = \mathbf{\phi}(\mathbf{x})^T \theta. \] Here, \(\mathbf{\phi}(\mathbf{x})\) represents a feature mapping of the input \(\mathbf{x}\).
The model remains linear in the parameters \(\theta\), even if \(\mathbf{\phi}(\mathbf{x})\) is nonlinear in \(\mathbf{x}\).

For now, we assume that the noise variance \(\sigma^2\) is known and focus on learning the optimal parameters \(\theta\).

Exercises

Exercise 5.1 Finding a regression function requires solving a variety of problems. List and discuss them.

Solution

Exercise 5.2 Consider the following data: \[(1,3), (2,4), (3,8), (4,9).\] Find the estimated regression line \[y = \beta_0 + \beta_1 x,\] based on the data. For each \(x_i\), compute both the estimated value of \(y\) and the residuals.

Solution

Exercise 5.3 A simple linear regression model is fit, relating plant growth over 1 year (y) to amount of fertilizer provided (x). Twenty five plants are selected, 5 each assigned to each of the fertilizer levels (12, 15, 18, 21, 24). The results of the model fit are given below:

Regression Coefficients

Model	B	Std. Error	t	Sig
Constant	8.624	1.81	4.764	0
\(x\)	0.527	0.098	5.386	0

Can we conclude that there is an association between fertilizer and plant growth at the 0.05 significance level?

Solution

Exercise 5.4 A multiple regression model is fit, relating salary (Y) to the following predictor variables: experience (\(X_1\), in years), accounts in charge of (\(X_2\)) and gender (\(X_3\) is 1 if female, 0 if male). The following ANOVA table and output gives the results for fitting the model. Conduct all tests at the 0.05 significance level:

ANOVA Table

	df	SS	MS	F	P-value
Regression	3	2470.4	823.5	76.9	0
Residual	21	224.7	10.7
Total	24	2695.1

Regression Coefficients

Model	B	Std. Error	t	Sig
Constant	39.58	1.89	21.00	0
Experience	3.61	0.36	10.04	0
Accounts	-0.28	0.36	-0.79	0.4389
Gender	-3.92	1.48	-2.65	0.0149

Test whether salary is associated with any of the predictor variables.

Solution