6.6 Key Steps of PCA in Practice
PCA consists of several main computational steps to identify and project data onto a lower-dimensional subspace.
1. Mean Subtraction
Center the data by subtracting the mean vector: \[ \boldsymbol{\mu} = \frac{1}{N} \sum_{n=1}^N \mathbf{x}_n, \quad \mathbf{x}_n \leftarrow \mathbf{x}_n - \boldsymbol{\mu}. \] This ensures the dataset has zero mean, which improves numerical stability.
2. Standardization
Divide each dimension by its standard deviation \(\sigma_d\): \[ x_{n}^{(d)} \leftarrow \frac{x_{n}^{(d)} - \mu_d}{\sigma_d}, \quad d = 1, \dots, D. \] This step removes scale dependence and gives each feature unit variance.
3. Eigendecomposition of the Covariance Matrix
Compute the covariance matrix: \[ \mathbf{S} = \frac{1}{N} \mathbf{X} \mathbf{X}^\top. \] Then find its eigenvalues and eigenvectors: \[ \mathbf{S} \mathbf{u}_i = \lambda_i \mathbf{u}_i. \] The eigenvectors form an orthonormal basis (ONB), and the one with the largest eigenvalue defines the principal subspace.
4. Projection
To project a new data point \(\mathbf{x}_\ast\) onto the principal subspace:
- Standardize using the training mean and standard deviation:
\[
x_\ast^{(d)} \leftarrow \frac{x_\ast^{(d)} - \mu_d}{\sigma_d}.
\]
- Project:
\[
\tilde{\mathbf{x}}_\ast = \mathbf{B}\mathbf{B}^\top \mathbf{x}_\ast.
\]
where \(\mathbf{B}\) contains the top eigenvectors as columns.
- Obtain low-dimensional coordinates: \[ \mathbf{z}_\ast = \mathbf{B}^\top \mathbf{x}_\ast. \]
- To recover the projection in the original data space: \[ \tilde{x}_\ast^{(d)} \leftarrow \tilde{x}_\ast^{(d)} \sigma_d + \mu_d. \]
Example 6.6 PCA applied to MNIST digits (e.g., digit “8”) shows that:
- With few components (e.g., 1–10 PCs), reconstructions are blurry.
- Increasing PCs (up to ~500) yields near-perfect reconstructions.
- The reconstruction error: \[ \frac{1}{N}\sum_{n=1}^N \| \mathbf{x}_n - \tilde{\mathbf{x}}_n \|^2 = \sum_{i=M+1}^{D} \lambda_i \] decreases sharply with \(M\) until most variance is captured by a small number of components.
Example 6.7 A Simple Centered Dataset
Consider five observations in \(\mathbb{R}^2\): \[ \mathbf{x}_1 = \begin{pmatrix} 2 \\ 0 \end{pmatrix}, \quad \mathbf{x}_2 = \begin{pmatrix} 0 \\ 1 \end{pmatrix}, \quad \mathbf{x}_3 = \begin{pmatrix} -2 \\ 0 \end{pmatrix}, \quad \mathbf{x}_4 = \begin{pmatrix} 0 \\ -1 \end{pmatrix}, \quad \mathbf{x}_5 = \begin{pmatrix} 0 \\ 0 \end{pmatrix}. \]
The sample mean is \[ \frac{1}{5}\sum_{n=1}^5 \mathbf{x}_n = \mathbf{0}, \]
so the data are already mean-centered. Geometrically, the data exhibit greater spread along the horizontal axis than the vertical axis.
Compute the Covariance Matrix
For centered data, the empirical covariance matrix is \[ \mathbf{S} = \frac{1}{N} \sum_{n=1}^N \mathbf{x}_n \mathbf{x}_n^\top. \] Summing the outer products gives \[ \sum_{n=1}^5 \mathbf{x}_n \mathbf{x}_n^\top = \begin{pmatrix} 8 & 0 \\ 0 & 2 \end{pmatrix}. \] Therefore, \[ \mathbf{S} = \frac{1}{5} \begin{pmatrix} 8 & 0 \\ 0 & 2 \end{pmatrix} = \begin{pmatrix} 1.6 & 0 \\ 0 & 0.4 \end{pmatrix}. \]
Variance Along a Direction
Let \[ \mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \end{pmatrix}, \quad \text{with } \|\mathbf{b}\|^2 = b_1^2 + b_2^2 = 1. \] The variance of the projection of the data onto \(\mathbf{b}\) is \[ V(\mathbf{b}) = \mathbf{b}^\top \mathbf{S} \mathbf{b} = 1.6 b_1^2 + 0.4 b_2^2. \] The PCA optimization problem is \[ \max_{\mathbf{b}} \; \mathbf{b}^\top \mathbf{S} \mathbf{b} \quad \text{subject to } \|\mathbf{b}\|^2 = 1. \]
Lagrangian Formulation
Introduce a Lagrange multiplier \(\lambda\): \[ L(\mathbf{b}, \lambda) = \mathbf{b}^\top \mathbf{S} \mathbf{b} + \lambda (1 - \mathbf{b}^\top \mathbf{b}). \] Taking derivatives with respect to \(\mathbf{b}\) and setting them to zero yields \[ \mathbf{S} \mathbf{b} = \lambda \mathbf{b}. \] Thus, the optimal direction \(\mathbf{b}\) must be an eigenvector of the covariance matrix \(\mathbf{S}\).
Eigenvalues and Eigenvectors
Since \(\mathbf{S}\) is diagonal, \[ \mathbf{S} = \begin{pmatrix} 1.6 & 0 \\ 0 & 0.4 \end{pmatrix}, \] its eigenvalues and eigenvectors are:
| Eigenvalue | Eigenvector |
|---|---|
| \(\lambda_1 = 1.6\) | \(\begin{pmatrix} 1 \\ 0 \end{pmatrix}\) |
| \(\lambda_2 = 0.4\) | \(\begin{pmatrix} 0 \\ 1 \end{pmatrix}\) |
First Principal Component
The largest eigenvalue is \(\lambda_1 = 1.6\), with eigenvector \[ \mathbf{b}_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}. \] This vector defines the first principal component. The variance of the data projected onto this direction is \[ V_1 = \lambda_1 = 1.6. \]
Projection and Reconstruction
For any data point \(\mathbf{x}_n\), the one-dimensional projection is \[ z_{1n} = \mathbf{b}_1^\top \mathbf{x}_n. \] The rank-1 reconstruction using the first principal component is \[ \tilde{\mathbf{x}}_n = \mathbf{b}_1 \mathbf{b}_1^\top \mathbf{x}_n = \begin{pmatrix} x_{n1} \\ 0 \end{pmatrix}. \] Thus, PCA retains the horizontal coordinate and discards the vertical one.
Exercises
Exercise 6.4 What are the steps of PCA?
Exercise 6.5 Consider the following data. What is the first principal component?
| x | y |
|---|---|
| 1 | -1 |
| 0 | 1 |
| -1 | 0 |
Exercise 6.6
Exercise 6.7 Consider the following data. What is the first principal component?
| x | y |
|---|---|
| 126 | 78 |
| 130 | 82 |
| 128 | 82 |
| 128 | 78 |
Exercise 6.8 Consider the following data. What is the first principal component?
| x | y |
|---|---|
| 4 | 11 |
| 8 | 4 |
| 13 | 5 |
| 7 | 14 |
Exercise 6.9 Consider the following data. What are the first two principal components?
| Math | Science | Arts |
|---|---|---|
| 90 | 90 | 80 |
| 90 | 80 | 90 |
| 70 | 70 | 60 |
| 70 | 60 | 70 |
| 50 | 50 | 50 |
| 50 | 40 | 50 |
Exercise 6.10 Consider the following data. What is the first principal component?
| Large Apples | Rotten Apples | Damaged Apples | Small Apples |
|---|---|---|---|
| 1 | 5 | 3 | 1 |
| 4 | 2 | 6 | 3 |
| 1 | 4 | 3 | 2 |
| 4 | 4 | 1 | 1 |
| 5 | 5 | 2 | 3 |
Exercise 6.11 Consider the following data. What is the first principal component?
| f1 | f2 | f3 | f4 |
|---|---|---|---|
| 1 | 2 | 3 | 4 |
| 5 | 5 | 6 | 7 |
| 1 | 4 | 2 | 3 |
| 5 | 3 | 2 | 1 |
| 8 | 1 | 2 | 2 |