5.3 Gradients of Vector-Valued Functions
In previous sections, we studied gradients of scalar functions \(f: \mathbb{R}^n \to \mathbb{R}\). We now generalize to vector-valued functions \(\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^m\), where \(m > 1\).
Definition 5.7 A vector valued function \(\mathbf{f}:\mathbb{R}^n \rightarrow \mathbb{R}^m\) can be written as: \[ \mathbf{f}(\mathbf{x}) = \begin{bmatrix} f_1(\mathbf{x}) \\ \vdots \\ f_m(\mathbf{x}) \end{bmatrix} \in \mathbb{R}^m \] where each \(f_i: \mathbb{R}^n \to \mathbb{R}\).
Example 5.8 Define the function \[ \mathbf{r}(t) = \begin{bmatrix} \cos t \\ \sin t \end{bmatrix}, \quad t \in \mathbb{R}. \] Each input \(t\) produces a 2D vector. This function traces out the unit circle in \(\mathbb{R}^2\).
Example 5.9 Define \[ \mathbf{r}(t) = \begin{bmatrix} t \\ t^2 \\ e^t \end{bmatrix}. \] The input is a scalar \(t\). The output is a vector in \(\mathbb{R}^3\). Each component is an ordinary real-valued function.
Definition 5.8
- The Jacobian collects all first-order partial derivatives of \(\mathbf{f}\): \[ \mathbf{J} = \nabla_x \mathbf{f} = \frac{d\mathbf{f}(\mathbf{x})}{d\mathbf{x}} = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{bmatrix} \in \mathbb{R}^{m \times n} \]
Each element \(\mathbf{J}(i, j) = \frac{\partial f_i}{\partial x_j}\) gives the rate of change of the \(i^{th}\) output with respect to the \(j^{th}\) input.
Example 5.10 Consider the vector-valued function
\[
\mathbf{f} : \mathbb{R}^2 \to \mathbb{R}^3
\]
defined by
\[
\mathbf{f}(x,y) =
\begin{bmatrix}
x^2 + y \\
xy \\
\sin x
\end{bmatrix}.
\]
The partial derivatives are:
- \(f_1(x,y) = x^2 + y\) \[ \frac{\partial f_1}{\partial x} = 2x, \quad \frac{\partial f_1}{\partial y} = 1 \]
- \(f_2(x,y) = xy\) \[ \frac{\partial f_2}{\partial x} = y, \quad \frac{\partial f_2}{\partial y} = x \]
- \(f_3(x,y) = \sin x\) \[ \frac{\partial f_3}{\partial x} = \cos x, \quad \frac{\partial f_3}{\partial y} = 0 \]
Thus, the Jacobian for \(\mathbf{f}\) is \[ \mathbf{J}_{\mathbf{f}}(x,y) = \begin{bmatrix} 2x & 1 \\ y & x \\ \cos x & 0 \end{bmatrix}. \]
This book adopts the numerator layout:
The derivative \(\frac{d\mathbf{f}}{d\mathbf{x}}\) is an \(m \times n\) matrix — rows correspond to function outputs, columns to input variables.
The Jacobian determinant \(|\det(\mathbf{J})|\) represents how a transformation scales areas or volumes. For example, a mapping with \(|\det(\mathbf{J})| = 3\) triples the area (or volume) of a region. This property becomes important in probability (e.g., change of variables in Section 6.7).
For linear mappings, such as \(y = \mathbf{\mathbf{J}}\mathbf{x}\), the Jacobian is simply the transformation matrix \(\mathbf{J}\). For nonlinear mappings, the Jacobian provides a local linear approximation around a point.
Example 5.11 Given \(\mathbf{f}(\mathbf{x}) = \mathbf{A}\mathbf{x}\), where \(\mathbf{A} \in \mathbb{R}^{M \times N}\) and \(\mathbf{x} \in \mathbb{R}^N\): \[ \frac{d\mathbf{f}}{d\mathbf{x}} = \mathbf{A} \] since each element \(\frac{\partial f_i}{\partial x_j} = A_{ij}\).
Definition 5.9 For compositions \(h(t) = f(\mathbf{g}(t))\) with \(f: \mathbb{R}^2 \to \mathbb{R}\) and \(\mathbf{g}: \mathbb{R} \to \mathbb{R}^2\), the chain rule generalizes to: \[ \frac{dh}{dt} = \frac{\partial f}{\partial \mathbf{g}} \frac{\partial \mathbf{g}}{\partial t} \] where matrix multiplication replaces scalar multiplication in the standard chain rule.
Example 5.12 Let
\[
\mathbf{g} : \mathbb{R} \to \mathbb{R}^2
\quad \text{and} \quad
f : \mathbb{R}^2 \to \mathbb{R}
\]
be defined by
\[
\mathbf{g}(t) =
\begin{bmatrix}
t^2 \\
\sin t
\end{bmatrix},
\qquad
f(x,y) = x y.
\]
Define the composition \[ h(t) = f(\mathbf{g}(t)). \]
Step 1: Write the Composition Explicitly
Substitute \(\mathbf{g}(t)\) into \(f\): \[ h(t) = f(t^2, \sin t) = t^2 \sin t. \]
Step 2: Apply the Chain Rule
The multivariable chain rule states: \[ h'(t) = \nabla f(\mathbf{g}(t)) \cdot \mathbf{g}'(t). \]
Step 3: Compute Each Component
- Gradient of \(f\): \[ \nabla f(x,y) = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix} = \begin{bmatrix} y \\ x \end{bmatrix}. \]
Evaluate at \(\mathbf{g}(t) = (t^2, \sin t)\): \[ \nabla f(\mathbf{g}(t)) = \begin{bmatrix} \sin t \\ t^2 \end{bmatrix}. \]
- Derivative of \(\mathbf{g}(t)\): \[ \mathbf{g}'(t) = \begin{bmatrix} 2t \\ \cos t \end{bmatrix}. \]
Step 4: Compute \(h'(t)\)
\[ h'(t) = \nabla f(\mathbf{g}(t)) \cdot \mathbf{g}'(t) = \begin{bmatrix} \sin t & t^2 \end{bmatrix} \begin{bmatrix} 2t \\ \cos t \end{bmatrix} = 2t \sin t + t^2 \cos t. \]
To Check, we can differentiate directly: \[ h(t) = t^2 \sin t \quad \Rightarrow \quad h'(t) = 2t \sin t + t^2 \cos t, \] which agrees with the chain rule result.
Example 5.13 For a linear model \(\mathbf{y} = \mathbf{\Phi} \mathbf{\theta}\) with residuals \(\mathbf{e}(\mathbf{\theta}) = \mathbf{y} - \mathbf{\Phi} \mathbf{\theta}\), the least-squares loss is: \[ L( \mathbf{e} ) = \| \mathbf{e} \|^2 = \mathbf{e}^\top \mathbf{e} \] Applying the chain rule gives: \[\begin{align*} \frac{\partial L}{\partial \mathbf{\theta}} &= \frac{\partial L}{\partial \mathbf{e}}\frac{\partial \mathbf{e}}{\partial \mathbf{\theta}} \\ &= -2 \mathbf{e}^\top \mathbf{\Phi}\\ &= -2 (\mathbf{y}^\top - \mathbf{\theta}^\top \mathbf{\Phi}^\top) \mathbf{\Phi}. \end{align*}\] This result forms the basis for optimization in linear regression (explored further in Chapter 9).
5.3.1 Dimensional Summary of Derivatives
| Function Type | Gradient Dimension |
|---|---|
| \(f: \mathbb{R} \to \mathbb{R}\) | Scalar (1 × 1) |
| \(f: \mathbb{R}^D \to \mathbb{R}\) | Row vector (1 × D) |
| \(f: \mathbb{R} \to \mathbb{R}^E\) | Column vector (E × 1) |
| \(f: \mathbb{R}^D \to \mathbb{R}^E\) | Matrix (E × D) |
Exercises
Exercise 5.17 Compute the Jacobian for \(x = 4u-3v^2\) and \(y = u^2-6v\).
Exercise 5.18 Compute the Jacobian for \(x = \sqrt{u}\) and \(y = 10u + v\).
Exercise 5.19 Compute the Jacobian for \(x = v^3u\) and \(y = u^2/v\).
Exercise 5.20 Compute the Jacobian for \(x = u^2v^3\) and \(y = 4-2\sqrt{u}\).
Exercise 5.21 Define a matrix \(\mathbf{A}\) to be \(3 \times 3\) and a vector \(\mathbf{x}\) to be length 3. Define \(\mathbf{f}(\mathbf{x}) = \mathbf{A} \mathbf{x}\). Compute \(d \mathbf{f}/d \mathbf{x}\).
Exercise 5.22 Let \(r(t) = \left[t^2 + 1, 3-t, t^3 \right]\). Find the unit tangent vector in the direction of \(r\).
Exercise 5.23
Prove each of the following rules \[\begin{array}{lrcll} \mathrm{i.} & \dfrac{d}{\,dt}[c\mathbf{r}(t)] & = & c\mathbf{r}′(t) & \text{Scalar multiple} \nonumber\\ \mathrm{ii.} & \dfrac{d}{\,dt}[\mathbf{r}(t)±\mathbf{u}(t)] & = & \mathbf{r}′(t)±\mathbf{u}′(t) & \text{Sum and difference} \nonumber\\ \mathrm{iii.} & \dfrac{d}{\,dt}[f(t)\mathbf{u}(t)] & = & f′(t)\mathbf{u}(t)+f(t)\mathbf{u}′(t) & \text{Scalar product} \nonumber\\ \mathrm{iv.} & \dfrac{d}{\,dt}[\mathbf{r}(t)⋅\mathbf{u}(t)] & = & \mathbf{r}′(t)⋅\mathbf{u}(t)+\mathbf{r}(t)⋅\mathbf{u}′(t) & \text{Dot product} \nonumber\\ \mathrm{v.} & \dfrac{d}{\,dt}[\mathbf{r}(f(t))] & = & \mathbf{r}′(f(t))⋅f′(t) & \text{Chain rule} \nonumber\\ \mathrm{vi.} & \text{If} \; \mathbf{r}(t)·\mathbf{r}(t) & = & c, \text{then} \; \mathbf{r}(t)⋅\mathbf{r}′(t) \; =0 \; . & \mathrm{} \nonumber \end{array} \nonumber\]