6.6 Key Steps of PCA in Practice

PCA consists of several main computational steps to identify and project data onto a lower-dimensional subspace.

1. Mean Subtraction

Center the data by subtracting the mean vector: \[ \boldsymbol{\mu} = \frac{1}{N} \sum_{n=1}^N \mathbf{x}_n, \quad \mathbf{x}_n \leftarrow \mathbf{x}_n - \boldsymbol{\mu}. \] This ensures the dataset has zero mean, which improves numerical stability.

2. Standardization

Divide each dimension by its standard deviation \(\sigma_d\): \[ x_{n}^{(d)} \leftarrow \frac{x_{n}^{(d)} - \mu_d}{\sigma_d}, \quad d = 1, \dots, D. \] This step removes scale dependence and gives each feature unit variance.

3. Eigendecomposition of the Covariance Matrix

Compute the covariance matrix: \[ \mathbf{S} = \frac{1}{N} \mathbf{X} \mathbf{X}^\top. \] Then find its eigenvalues and eigenvectors: \[ \mathbf{S} \mathbf{u}_i = \lambda_i \mathbf{u}_i. \] The eigenvectors form an orthonormal basis (ONB), and the one with the largest eigenvalue defines the principal subspace.

4. Projection

To project a new data point \(\mathbf{x}_\ast\) onto the principal subspace:

Standardize using the training mean and standard deviation: \[ x_\ast^{(d)} \leftarrow \frac{x_\ast^{(d)} - \mu_d}{\sigma_d}. \]
Project: \[ \tilde{\mathbf{x}}_\ast = \mathbf{B}\mathbf{B}^\top \mathbf{x}_\ast. \] where \(\mathbf{B}\) contains the top eigenvectors as columns.
Obtain low-dimensional coordinates: \[ \mathbf{z}_\ast = \mathbf{B}^\top \mathbf{x}_\ast. \]
To recover the projection in the original data space: \[ \tilde{x}_\ast^{(d)} \leftarrow \tilde{x}_\ast^{(d)} \sigma_d + \mu_d. \]

Example 6.6 PCA applied to MNIST digits (e.g., digit “8”) shows that:

With few components (e.g., 1–10 PCs), reconstructions are blurry.
Increasing PCs (up to ~500) yields near-perfect reconstructions.
The reconstruction error: \[ \frac{1}{N}\sum_{n=1}^N \| \mathbf{x}_n - \tilde{\mathbf{x}}_n \|^2 = \sum_{i=M+1}^{D} \lambda_i \] decreases sharply with \(M\) until most variance is captured by a small number of components.

Example 6.7 A Simple Centered Dataset

Consider five observations in \(\mathbb{R}^2\): \[ \mathbf{x}_1 = \begin{pmatrix} 2 \\ 0 \end{pmatrix}, \quad \mathbf{x}_2 = \begin{pmatrix} 0 \\ 1 \end{pmatrix}, \quad \mathbf{x}_3 = \begin{pmatrix} -2 \\ 0 \end{pmatrix}, \quad \mathbf{x}_4 = \begin{pmatrix} 0 \\ -1 \end{pmatrix}, \quad \mathbf{x}_5 = \begin{pmatrix} 0 \\ 0 \end{pmatrix}. \]

The sample mean is \[ \frac{1}{5}\sum_{n=1}^5 \mathbf{x}_n = \mathbf{0}, \]

so the data are already mean-centered. Geometrically, the data exhibit greater spread along the horizontal axis than the vertical axis.

Compute the Covariance Matrix

For centered data, the empirical covariance matrix is \[ \mathbf{S} = \frac{1}{N} \sum_{n=1}^N \mathbf{x}_n \mathbf{x}_n^\top. \] Summing the outer products gives \[ \sum_{n=1}^5 \mathbf{x}_n \mathbf{x}_n^\top = \begin{pmatrix} 8 & 0 \\ 0 & 2 \end{pmatrix}. \] Therefore, \[ \mathbf{S} = \frac{1}{5} \begin{pmatrix} 8 & 0 \\ 0 & 2 \end{pmatrix} = \begin{pmatrix} 1.6 & 0 \\ 0 & 0.4 \end{pmatrix}. \]

Variance Along a Direction

Let \[ \mathbf{b} = \begin{pmatrix} b_1 \\ b_2 \end{pmatrix}, \quad \text{with } \|\mathbf{b}\|^2 = b_1^2 + b_2^2 = 1. \] The variance of the projection of the data onto \(\mathbf{b}\) is \[ V(\mathbf{b}) = \mathbf{b}^\top \mathbf{S} \mathbf{b} = 1.6 b_1^2 + 0.4 b_2^2. \] The PCA optimization problem is \[ \max_{\mathbf{b}} \; \mathbf{b}^\top \mathbf{S} \mathbf{b} \quad \text{subject to } \|\mathbf{b}\|^2 = 1. \]

Lagrangian Formulation

Introduce a Lagrange multiplier \(\lambda\): \[ L(\mathbf{b}, \lambda) = \mathbf{b}^\top \mathbf{S} \mathbf{b} + \lambda (1 - \mathbf{b}^\top \mathbf{b}). \] Taking derivatives with respect to \(\mathbf{b}\) and setting them to zero yields \[ \mathbf{S} \mathbf{b} = \lambda \mathbf{b}. \] Thus, the optimal direction \(\mathbf{b}\) must be an eigenvector of the covariance matrix \(\mathbf{S}\).

Eigenvalues and Eigenvectors

Since \(\mathbf{S}\) is diagonal, \[ \mathbf{S} = \begin{pmatrix} 1.6 & 0 \\ 0 & 0.4 \end{pmatrix}, \] its eigenvalues and eigenvectors are:

Eigenvalue	Eigenvector
\(\lambda_1 = 1.6\)	\(\begin{pmatrix} 1 \\ 0 \end{pmatrix}\)
\(\lambda_2 = 0.4\)	\(\begin{pmatrix} 0 \\ 1 \end{pmatrix}\)

First Principal Component

The largest eigenvalue is \(\lambda_1 = 1.6\), with eigenvector \[ \mathbf{b}_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}. \] This vector defines the first principal component. The variance of the data projected onto this direction is \[ V_1 = \lambda_1 = 1.6. \]

Projection and Reconstruction

For any data point \(\mathbf{x}_n\), the one-dimensional projection is \[ z_{1n} = \mathbf{b}_1^\top \mathbf{x}_n. \] The rank-1 reconstruction using the first principal component is \[ \tilde{\mathbf{x}}_n = \mathbf{b}_1 \mathbf{b}_1^\top \mathbf{x}_n = \begin{pmatrix} x_{n1} \\ 0 \end{pmatrix}. \] Thus, PCA retains the horizontal coordinate and discards the vertical one.

Exercises

Exercise 6.4 What are the steps of PCA?

Solution

Exercise 6.5 Consider the following data. What is the first principal component?

x	y
1	-1
0	1
-1	0

Solution

Exercise 6.6

Solution

Exercise 6.7 Consider the following data. What is the first principal component?

x	y
126	78
130	82
128	82
128	78

Solution

Exercise 6.8 Consider the following data. What is the first principal component?

x	y
4	11
8	4
13	5
7	14

Solution

Exercise 6.9 Consider the following data. What are the first two principal components?

Math	Science	Arts
90	90	80
90	80	90
70	70	60
70	60	70
50	50	50
50	40	50

Solution

Exercise 6.10 Consider the following data. What is the first principal component?

Large Apples	Rotten Apples	Damaged Apples	Small Apples
1	5	3	1
4	2	6	3
1	4	3	2
4	4	1	1
5	5	2	3

Solution

Exercise 6.11 Consider the following data. What is the first principal component?

f1	f2	f3	f4
1	2	3	4
5	5	6	7
1	4	2	3
5	3	2	1
8	1	2	2

Solution