6.6 Key Steps of PCA in Practice

PCA consists of several main computational steps to identify and project data onto a lower-dimensional subspace.

1. Mean Subtraction

Center the data by subtracting the mean vector: \[ \boldsymbol{\mu} = \frac{1}{N} \sum_{n=1}^N \mathbf{x}_n, \quad \mathbf{x}_n \leftarrow \mathbf{x}_n - \boldsymbol{\mu}. \] This ensures the dataset has zero mean, which improves numerical stability.

2. Standardization

Divide each dimension by its standard deviation \(\sigma_d\): \[ x_{n}^{(d)} \leftarrow \frac{x_{n}^{(d)} - \mu_d}{\sigma_d}, \quad d = 1, \dots, D. \] This step removes scale dependence and gives each feature unit variance.

3. Eigendecomposition of the Covariance Matrix

Compute the covariance matrix: \[ \mathbf{S} = \frac{1}{N} \mathbf{X} \mathbf{X}^\top. \] Then find its eigenvalues and eigenvectors: \[ \mathbf{S} \mathbf{u}_i = \lambda_i \mathbf{u}_i. \] The eigenvectors form an orthonormal basis (ONB), and the one with the largest eigenvalue defines the principal subspace.

4. Projection

To project a new data point \(\mathbf{x}_\ast\) onto the principal subspace:

  1. Standardize using the training mean and standard deviation: \[ x_\ast^{(d)} \leftarrow \frac{x_\ast^{(d)} - \mu_d}{\sigma_d}. \]
  2. Project: \[ \tilde{\mathbf{x}}_\ast = \mathbf{B}\mathbf{B}^\top \mathbf{x}_\ast. \] where \(\mathbf{B}\) contains the top eigenvectors as columns.
  3. Obtain low-dimensional coordinates: \[ \mathbf{z}_\ast = \mathbf{B}^\top \mathbf{x}_\ast. \]
  4. To recover the projection in the original data space: \[ \tilde{x}_\ast^{(d)} \leftarrow \tilde{x}_\ast^{(d)} \sigma_d + \mu_d. \]

Example 6.6 PCA applied to MNIST digits (e.g., digit “8”) shows that:

  • With few components (e.g., 1–10 PCs), reconstructions are blurry.
  • Increasing PCs (up to ~500) yields near-perfect reconstructions.
  • The reconstruction error: \[ \frac{1}{N}\sum_{n=1}^N \| \mathbf{x}_n - \tilde{\mathbf{x}}_n \|^2 = \sum_{i=M+1}^{D} \lambda_i \] decreases sharply with \(M\) until most variance is captured by a small number of components.

Example 6.7 A Simple Centered Dataset

Consider five observations in \(\mathbb{R}^2\): \[ \mathbf{x}_1 = \begin{pmatrix} 2 \\ 0 \end{pmatrix}, \quad \mathbf{x}_2 = \begin{pmatrix} 0 \\ 1 \end{pmatrix}, \quad \mathbf{x}_3 = \begin{pmatrix} -2 \\ 0 \end{pmatrix}, \quad \mathbf{x}_4 = \begin{pmatrix} 0 \\ -1 \end{pmatrix}, \quad \mathbf{x}_5 = \begin{pmatrix} 0 \\ 0 \end{pmatrix}. \]

The sample mean is \[ \frac{1}{5}\sum_{n=1}^5 \mathbf{x}_n = \mathbf{0}, \]

so the data are already mean-centered. Geometrically, the data exhibit greater spread along the horizontal axis than the vertical axis.

Compute the Covariance Matrix

For centered data, the empirical covariance matrix is \[ \mathbf{S} = \frac{1}{N} \sum_{n=1}^N \mathbf{x}_n \mathbf{x}_n^\top. \] Summing the outer products gives \[ \sum_{n=1}^5 \mathbf{x}_n \mathbf{x}_n^\top = \begin{pmatrix} 8 & 0 \\ 0 & 2 \end{pmatrix}. \] Therefore, \[ \mathbf{S} = \frac{1}{5} \begin{pmatrix} 8 & 0 \\ 0 & 2 \end{pmatrix} = \begin{pmatrix} 1.6 & 0 \\ 0 & 0.4 \end{pmatrix}. \]

Variance Along a Direction

Let \[ \mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \end{pmatrix}, \quad \text{with } \|\mathbf{b}\|^2 = b_1^2 + b_2^2 = 1. \] The variance of the projection of the data onto \(\mathbf{b}\) is \[ V(\mathbf{b}) = \mathbf{b}^\top \mathbf{S} \mathbf{b} = 1.6 b_1^2 + 0.4 b_2^2. \] The PCA optimization problem is \[ \max_{\mathbf{b}} \; \mathbf{b}^\top \mathbf{S} \mathbf{b} \quad \text{subject to } \|\mathbf{b}\|^2 = 1. \]

Lagrangian Formulation

Introduce a Lagrange multiplier \(\lambda\): \[ L(\mathbf{b}, \lambda) = \mathbf{b}^\top \mathbf{S} \mathbf{b} + \lambda (1 - \mathbf{b}^\top \mathbf{b}). \] Taking derivatives with respect to \(\mathbf{b}\) and setting them to zero yields \[ \mathbf{S} \mathbf{b} = \lambda \mathbf{b}. \] Thus, the optimal direction \(\mathbf{b}\) must be an eigenvector of the covariance matrix \(\mathbf{S}\).

Eigenvalues and Eigenvectors

Since \(\mathbf{S}\) is diagonal, \[ \mathbf{S} = \begin{pmatrix} 1.6 & 0 \\ 0 & 0.4 \end{pmatrix}, \] its eigenvalues and eigenvectors are:

Eigenvalue Eigenvector
\(\lambda_1 = 1.6\) \(\begin{pmatrix} 1 \\ 0 \end{pmatrix}\)
\(\lambda_2 = 0.4\) \(\begin{pmatrix} 0 \\ 1 \end{pmatrix}\)

First Principal Component

The largest eigenvalue is \(\lambda_1 = 1.6\), with eigenvector \[ \mathbf{b}_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}. \] This vector defines the first principal component. The variance of the data projected onto this direction is \[ V_1 = \lambda_1 = 1.6. \]

Projection and Reconstruction

For any data point \(\mathbf{x}_n\), the one-dimensional projection is \[ z_{1n} = \mathbf{b}_1^\top \mathbf{x}_n. \] The rank-1 reconstruction using the first principal component is \[ \tilde{\mathbf{x}}_n = \mathbf{b}_1 \mathbf{b}_1^\top \mathbf{x}_n = \begin{pmatrix} x_{n1} \\ 0 \end{pmatrix}. \] Thus, PCA retains the horizontal coordinate and discards the vertical one.


Exercises

Exercise 6.4 What are the steps of PCA?

Exercise 6.5 Consider the following data. What is the first principal component?

x y
1 -1
0 1
-1 0

Exercise 6.6

Exercise 6.7 Consider the following data. What is the first principal component?

x y
126 78
130 82
128 82
128 78

Exercise 6.8 Consider the following data. What is the first principal component?

x y
4 11
8 4
13 5
7 14

Exercise 6.9 Consider the following data. What are the first two principal components?

Math Science Arts
90 90 80
90 80 90
70 70 60
70 60 70
50 50 50
50 40 50

Exercise 6.10 Consider the following data. What is the first principal component?

Large Apples Rotten Apples Damaged Apples Small Apples
1 5 3 1
4 2 6 3
1 4 3 2
4 4 1 1
5 5 2 3

Exercise 6.11 Consider the following data. What is the first principal component?

f1 f2 f3 f4
1 2 3 4
5 5 6 7
1 4 2 3
5 3 2 1
8 1 2 2