5.2 Partial Differentiation and Gradients

Partial derivatives extends differentiation to functions of multiple variables \(f(x_1, x_2, \ldots, x_n) = f(\mathbf{x})\). The gradient generalizes the concept of the derivative for multivariate functions. Each component of the gradient is a partial derivative, found by differentiating with respect to one variable while keeping others constant.

Definition 5.4 For a function \(f : \mathbb{R}^n \to \mathbb{R}\), the partial derivative with respect to each variable \(x_i\) is: \[ \frac{\partial f}{\partial x_i} = \lim_{h \to 0} \frac{f(x_1, \ldots, x_i + h, \ldots, x_n) - f(x)}{h} \]

Collecting all partial derivatives gives the gradient (Jacobian):

Definition 5.5 The Gradient of a function of \(n\) variables is the vector of first partial derivatives with respect to each variable. \[ \nabla_x f = \text{grad} f = \frac{df}{d\mathbf{x}} = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \frac{\partial f}{\partial x_2} & \cdots & \frac{\partial f}{\partial x_n} \end{bmatrix} \in \mathbb{R}^{1 \times n} \]

The gradient is defined here as a row vector for consistency with later generalizations.

Example 5.5 Given \(f(x, y) = (x + 2y^3)^2\), we have \[ \frac{\partial f}{\partial x} = 2(x + 2y^3) \] and \[ \frac{\partial f}{\partial y} = 2(x + 2y^3) \cdot 6y^2 = 12y^2(x + 2y^3). \] So, the gradient for \(f\) is given by \[ \nabla_x f = \text{grad} f = \frac{df}{d\mathbf{x}} =\left[2(x + 2y^3), \;\; 12y^2(x + 2y^3) \right].\]

Although gradients are often represented as column vectors, this text defines them as row vectors. By doing so, we have the following advantages:

  1. Allows consistent generalization to vector-valued functions \(f: \mathbb{R}^n \to \mathbb{R}^m\).
  2. Enables a compact matrix form of the multivariate chain rule.

Example 5.6 For \(f(x_1, x_2) = x_1^2x_2 + x_1x_2^3\): \[ \frac{\partial f}{\partial x_1} = 2x_1x_2 + x_2^3 \] \[ \frac{\partial f}{\partial x_2} = x_1^2 + 3x_1x_2^2 \] Hence, the gradient is: \[ \frac{df}{d\mathbf{x} } = \begin{bmatrix} 2x_1x_2 + x_2^3 & x_1^2 + 3x_1x_2^2 \end{bmatrix} \in \mathbb{R}^{1 \times 2} \]


5.2.1 Basic Rules of Partial Differentiation

The basic differentiation rules extend naturally to multivariate functions.

Theorem 5.2 Let \(f\) and \(g\) be continuous functions.

  • Product Rule:
    \[ (f(\mathbf{x})g(\mathbf{x}))' = f'(\mathbf{x})g(\mathbf{x}) + f(\mathbf{x})g'(\mathbf{x}) \]

  • Sum Rule:
    \[ (f(\mathbf{x}) + g(\mathbf{x}))' = f'(\mathbf{x}) + g'(\mathbf{x}) \]

  • Chain Rule:
    \[ (g \circ f)'(\mathbf{x}) = g'(f(\mathbf{x})) f'(\mathbf{x}) \]


5.2.2 Multivariate Chain Rule (Matrix Form)

Definition 5.6 For functions of several variables, the chain rule can be written compactly as a matrix multiplication: \[ \frac{df}{d(s, t)} = \frac{\partial f}{\partial x} \frac{\partial x}{\partial (s, t)} \]

\[ = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \frac{\partial f}{\partial x_2} \end{bmatrix} \begin{bmatrix} \frac{\partial x_1}{\partial s} & \frac{\partial x_1}{\partial t} \\ \frac{\partial x_2}{\partial s} & \frac{\partial x_2}{\partial t} \end{bmatrix} \]

This form only works directly if the gradient is defined as a row vector.

Example 5.7 Let
\[ f(x, y) = x^2 y + \sin(y) \] where \(x\) and \(y\) are functions of a single variable \(t\) given by \[ x(t) = t^2, \qquad y(t) = e^{t}. \] Compute
\[ \frac{d}{dt} \, f(x(t), y(t)). \]

By the multivariable chain rule, \[ \frac{d}{dt} f(x(t), y(t)) = \frac{\partial f}{\partial x}\frac{dx}{dt} + \frac{\partial f}{\partial y}\frac{dy}{dt}. \]

First compute the partial derivatives: \[ \frac{\partial f}{\partial x} = 2xy, \qquad \frac{\partial f}{\partial y} = x^2 + \cos(y). \]

Next compute the derivatives of \(x(t)\) and \(y(t)\): \[ \frac{dx}{dt} = 2t, \qquad \frac{dy}{dt} = e^{t}. \]

Substitute \(x(t) = t^2\) and \(y(t) = e^t\): \[ \frac{d}{dt} f(x(t), y(t)) = (2 t^2 e^t)(2t) + (t^4 + \cos(e^t)) e^t= 4 t^3 e^t + t^4 e^t + e^t \cos(e^t) \]


Exercises

Exercise 5.8 Find the gradient of \(f\left( {x,y,z} \right) = {x^2}z + {y^3}{z^2} - xyz\).

Exercise 5.9 Find the gradient of \(f(x,y) = xe^{xy}\).

Exercise 5.10 Find the gradient of \(f\left( {x,y} \right) = x\cos \left( y \right)\).

Exercise 5.11 Find the gradient of \(f\left( {x,y,z} \right) = \sin \left( {yz} \right) + \ln \left( {{x^2}} \right)\).

Exercise 5.12 The rate of change of \(f(x,y,z)\) in the direction of a unit vector \(\mathbf{u}\) is called the directional derivative and is denoted by \(D_{\mathbf{u}} f(x,y,z)\). We can show that \[D_u f(x,y,z) = \langle f_x, f_y, f_z \rangle \cdot \langle a,b,c \rangle,\] with \(\mathbf{u} = <a,b,c>\). Prove that the maximum value of \(D_{\mathbf{u}} f(x,y,z)\) (and hence the maximum rate of change of \(f(x,y,z)\)) is given by \(||\nabla f(x,y,z)||\) will occur in the direction given by \(\nabla f(x,y,z)\).

Exercise 5.13 Prove that the gradient vector \(\nabla f(x_0,y_0)\) is orthogonal to the level curve \(f(x,y) = k\) at the point \((x(t_0),y(t_0)) = (x_0,y_0)\).

Exercise 5.14 Find \(\frac{dz}{dt}\) for \[z = xe^{xy}, \;\;\;\; x = t^2, \;\;\;\; y = t^{-1}.\]

Exercise 5.15 Find \(\frac{dz}{dt}\) for \[z = {x^2}{y^3} + y\cos x, \;\;\;\; x = \ln(t^2), \;\;\;\; y = \sin(4t).\]

Exercise 5.16 Find \(\frac{dz}{dx}\) for \[z = x\ln \left( {xy} \right) + {y^3}, \;\;\;\; y=\cos(x^2+1).\]