5.3 Gradients of Vector-Valued Functions

In previous sections, we studied gradients of scalar functions \(f: \mathbb{R}^n \to \mathbb{R}\). We now generalize to vector-valued functions \(\mathbf{f}: \mathbb{R}^n \to \mathbb{R}^m\), where \(m > 1\).

Definition 5.7 A vector valued function \(\mathbf{f}:\mathbb{R}^n \rightarrow \mathbb{R}^m\) can be written as: \[ \mathbf{f}(\mathbf{x}) = \begin{bmatrix} f_1(\mathbf{x}) \\ \vdots \\ f_m(\mathbf{x}) \end{bmatrix} \in \mathbb{R}^m \] where each \(f_i: \mathbb{R}^n \to \mathbb{R}\).

Example 5.8 Define the function \[ \mathbf{r}(t) = \begin{bmatrix} \cos t \\ \sin t \end{bmatrix}, \quad t \in \mathbb{R}. \] Each input \(t\) produces a 2D vector. This function traces out the unit circle in \(\mathbb{R}^2\).

Example 5.9 Define \[ \mathbf{r}(t) = \begin{bmatrix} t \\ t^2 \\ e^t \end{bmatrix}. \] The input is a scalar \(t\). The output is a vector in \(\mathbb{R}^3\). Each component is an ordinary real-valued function.

Definition 5.8

  • The Jacobian collects all first-order partial derivatives of \(\mathbf{f}\): \[ \mathbf{J} = \nabla_x \mathbf{f} = \frac{d\mathbf{f}(\mathbf{x})}{d\mathbf{x}} = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n} \end{bmatrix} \in \mathbb{R}^{m \times n} \]

Each element \(\mathbf{J}(i, j) = \frac{\partial f_i}{\partial x_j}\) gives the rate of change of the \(i^{th}\) output with respect to the \(j^{th}\) input.

Example 5.10 Consider the vector-valued function
\[ \mathbf{f} : \mathbb{R}^2 \to \mathbb{R}^3 \] defined by \[ \mathbf{f}(x,y) = \begin{bmatrix} x^2 + y \\ xy \\ \sin x \end{bmatrix}. \] The partial derivatives are:

  • \(f_1(x,y) = x^2 + y\) \[ \frac{\partial f_1}{\partial x} = 2x, \quad \frac{\partial f_1}{\partial y} = 1 \]
  • \(f_2(x,y) = xy\) \[ \frac{\partial f_2}{\partial x} = y, \quad \frac{\partial f_2}{\partial y} = x \]
  • \(f_3(x,y) = \sin x\) \[ \frac{\partial f_3}{\partial x} = \cos x, \quad \frac{\partial f_3}{\partial y} = 0 \]

Thus, the Jacobian for \(\mathbf{f}\) is \[ \mathbf{J}_{\mathbf{f}}(x,y) = \begin{bmatrix} 2x & 1 \\ y & x \\ \cos x & 0 \end{bmatrix}. \]

This book adopts the numerator layout:
The derivative \(\frac{d\mathbf{f}}{d\mathbf{x}}\) is an \(m \times n\) matrix — rows correspond to function outputs, columns to input variables.

The Jacobian determinant \(|\det(\mathbf{J})|\) represents how a transformation scales areas or volumes. For example, a mapping with \(|\det(\mathbf{J})| = 3\) triples the area (or volume) of a region. This property becomes important in probability (e.g., change of variables in Section 6.7).

For linear mappings, such as \(y = \mathbf{\mathbf{J}}\mathbf{x}\), the Jacobian is simply the transformation matrix \(\mathbf{J}\). For nonlinear mappings, the Jacobian provides a local linear approximation around a point.

Example 5.11 Given \(\mathbf{f}(\mathbf{x}) = \mathbf{A}\mathbf{x}\), where \(\mathbf{A} \in \mathbb{R}^{M \times N}\) and \(\mathbf{x} \in \mathbb{R}^N\): \[ \frac{d\mathbf{f}}{d\mathbf{x}} = \mathbf{A} \] since each element \(\frac{\partial f_i}{\partial x_j} = A_{ij}\).

Definition 5.9 For compositions \(h(t) = f(\mathbf{g}(t))\) with \(f: \mathbb{R}^2 \to \mathbb{R}\) and \(\mathbf{g}: \mathbb{R} \to \mathbb{R}^2\), the chain rule generalizes to: \[ \frac{dh}{dt} = \frac{\partial f}{\partial \mathbf{g}} \frac{\partial \mathbf{g}}{\partial t} \] where matrix multiplication replaces scalar multiplication in the standard chain rule.

Example 5.12 Let
\[ \mathbf{g} : \mathbb{R} \to \mathbb{R}^2 \quad \text{and} \quad f : \mathbb{R}^2 \to \mathbb{R} \] be defined by \[ \mathbf{g}(t) = \begin{bmatrix} t^2 \\ \sin t \end{bmatrix}, \qquad f(x,y) = x y. \]

Define the composition \[ h(t) = f(\mathbf{g}(t)). \]

Step 1: Write the Composition Explicitly

Substitute \(\mathbf{g}(t)\) into \(f\): \[ h(t) = f(t^2, \sin t) = t^2 \sin t. \]

Step 2: Apply the Chain Rule

The multivariable chain rule states: \[ h'(t) = \nabla f(\mathbf{g}(t)) \cdot \mathbf{g}'(t). \]

Step 3: Compute Each Component

  • Gradient of \(f\): \[ \nabla f(x,y) = \begin{bmatrix} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{bmatrix} = \begin{bmatrix} y \\ x \end{bmatrix}. \]

Evaluate at \(\mathbf{g}(t) = (t^2, \sin t)\): \[ \nabla f(\mathbf{g}(t)) = \begin{bmatrix} \sin t \\ t^2 \end{bmatrix}. \]

  • Derivative of \(\mathbf{g}(t)\): \[ \mathbf{g}'(t) = \begin{bmatrix} 2t \\ \cos t \end{bmatrix}. \]

Step 4: Compute \(h'(t)\)

\[ h'(t) = \nabla f(\mathbf{g}(t)) \cdot \mathbf{g}'(t) = \begin{bmatrix} \sin t & t^2 \end{bmatrix} \begin{bmatrix} 2t \\ \cos t \end{bmatrix} = 2t \sin t + t^2 \cos t. \]

To Check, we can differentiate directly: \[ h(t) = t^2 \sin t \quad \Rightarrow \quad h'(t) = 2t \sin t + t^2 \cos t, \] which agrees with the chain rule result.

Example 5.13 For a linear model \(\mathbf{y} = \mathbf{\Phi} \mathbf{\theta}\) with residuals \(\mathbf{e}(\mathbf{\theta}) = \mathbf{y} - \mathbf{\Phi} \mathbf{\theta}\), the least-squares loss is: \[ L( \mathbf{e} ) = \| \mathbf{e} \|^2 = \mathbf{e}^\top \mathbf{e} \] Applying the chain rule gives: \[\begin{align*} \frac{\partial L}{\partial \mathbf{\theta}} &= \frac{\partial L}{\partial \mathbf{e}}\frac{\partial \mathbf{e}}{\partial \mathbf{\theta}} \\ &= -2 \mathbf{e}^\top \mathbf{\Phi}\\ &= -2 (\mathbf{y}^\top - \mathbf{\theta}^\top \mathbf{\Phi}^\top) \mathbf{\Phi}. \end{align*}\] This result forms the basis for optimization in linear regression (explored further in Chapter 9).


5.3.1 Dimensional Summary of Derivatives

Function Type Gradient Dimension
\(f: \mathbb{R} \to \mathbb{R}\) Scalar (1 × 1)
\(f: \mathbb{R}^D \to \mathbb{R}\) Row vector (1 × D)
\(f: \mathbb{R} \to \mathbb{R}^E\) Column vector (E × 1)
\(f: \mathbb{R}^D \to \mathbb{R}^E\) Matrix (E × D)

Exercises

Exercise 5.17 Compute the Jacobian for \(x = 4u-3v^2\) and \(y = u^2-6v\).

Exercise 5.18 Compute the Jacobian for \(x = \sqrt{u}\) and \(y = 10u + v\).

Exercise 5.19 Compute the Jacobian for \(x = v^3u\) and \(y = u^2/v\).

Exercise 5.20 Compute the Jacobian for \(x = u^2v^3\) and \(y = 4-2\sqrt{u}\).

Exercise 5.21 Define a matrix \(\mathbf{A}\) to be \(3 \times 3\) and a vector \(\mathbf{x}\) to be length 3. Define \(\mathbf{f}(\mathbf{x}) = \mathbf{A} \mathbf{x}\). Compute \(d \mathbf{f}/d \mathbf{x}\).

Exercise 5.22 Let \(r(t) = \left[t^2 + 1, 3-t, t^3 \right]\). Find the unit tangent vector in the direction of \(r\).

Exercise 5.23

Prove each of the following rules \[\begin{array}{lrcll} \mathrm{i.} & \dfrac{d}{\,dt}[c\mathbf{r}(t)] & = & c\mathbf{r}′(t) & \text{Scalar multiple} \nonumber\\ \mathrm{ii.} & \dfrac{d}{\,dt}[\mathbf{r}(t)±\mathbf{u}(t)] & = & \mathbf{r}′(t)±\mathbf{u}′(t) & \text{Sum and difference} \nonumber\\ \mathrm{iii.} & \dfrac{d}{\,dt}[f(t)\mathbf{u}(t)] & = & f′(t)\mathbf{u}(t)+f(t)\mathbf{u}′(t) & \text{Scalar product} \nonumber\\ \mathrm{iv.} & \dfrac{d}{\,dt}[\mathbf{r}(t)⋅\mathbf{u}(t)] & = & \mathbf{r}′(t)⋅\mathbf{u}(t)+\mathbf{r}(t)⋅\mathbf{u}′(t) & \text{Dot product} \nonumber\\ \mathrm{v.} & \dfrac{d}{\,dt}[\mathbf{r}(f(t))] & = & \mathbf{r}′(f(t))⋅f′(t) & \text{Chain rule} \nonumber\\ \mathrm{vi.} & \text{If} \; \mathbf{r}(t)·\mathbf{r}(t) & = & c, \text{then} \; \mathbf{r}(t)⋅\mathbf{r}′(t) \; =0 \; . & \mathrm{} \nonumber \end{array} \nonumber\]