6.1 Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a method for linear dimensionality reduction.
The idea was originally proposed by Pearson (1901) and Hotelling (1933). In signal processing, PCA is also known as the Karhunen–Loève Transform.
PCA is used for data compression, visualization, and identifying patterns or latent factors in high-dimensional data.

To begin, PCA assumes that high-dimensional data lies approximately on a low-dimensional subspace.
By projecting data onto this subspace, PCA reduces dimensionality while preserving as much variance (information) as possible.

Let:

\(\mathcal{X} = \{\mathbf{x}_1, \dots, \mathbf{x}_N\}\), with \(\mathbf{x}_n \in \mathbb{R}^D\) (mean-centered data)
The covariance matrix be defined as:
\[ \mathbf{S} = \frac{1}{N} \sum_{n=1}^{N} \mathbf{x}_n \mathbf{x}_n^\top \]
A low-dimensional representation (code):
\[ \mathbf{z}_n = \mathbf{B}^\top \mathbf{x}_n \in \mathbb{R}^M, \quad M < D \]
The projection matrix:
\[ \mathbf{B} = [\mathbf{b}_1, \dots, \mathbf{b}_M] \in \mathbb{R}^{D \times M} \] where the columns \(\mathbf{b}_i\) are orthonormal: \(\mathbf{b}_i^\top \mathbf{b}_j = 0\) if \(i \neq j\), and \(\mathbf{b}_i^\top \mathbf{b}_i = 1\).

The projected data is given by: \[ \tilde{\mathbf{x}}_n = \mathbf{B}\mathbf{B}^\top \mathbf{x}_n \in \mathbb{R}^D \]

The goal of PCA is to find \(\mathbf{B}\) and \(\mathbf{z}_n\) that minimize information loss between \(\mathbf{x}_n\) and \(\tilde{\mathbf{x}}_n\).

Example 6.1 In \(\mathbb{R}^2\), the canonical basis is: \[ \mathbf{e}_1 = [1, 0]^\top, \quad \mathbf{e}_2 = [0, 1]^\top \]

A point \(\mathbf{x} = [5, 3]^\top = 5\mathbf{e}_1 + 3\mathbf{e}_2\) can be represented in this basis.
If we project onto the subspace spanned by \(\mathbf{e}_2\): \[ \tilde{\mathbf{x}} = [0, z]^\top = z\mathbf{e}_2 \] then only the coordinate \(z\) needs to be stored—representing a 1D subspace of \(\mathbb{R}^2\).

This illustrates dimensionality reduction: the original 2D vector is represented by a single coordinate in a 1D subspace.

6.1.1 Compression Interpretation

PCA can be viewed as a data compression system:

Encoder: \(\mathbf{z} = \mathbf{B}^\top \mathbf{x}\)
(projects data to lower-dimensional space)
Decoder: \(\tilde{\mathbf{x}} = \mathbf{B}\mathbf{z}\)
(reconstructs data from its low-dimensional code)

The matrix \(\mathbf{B}\) defines both the encoding and decoding transformations.

Example 6.2 MNIST Digits

PCA is often demonstrated using the MNIST dataset, which contains 60,000 grayscale images of handwritten digits (0–9).

Each image has 28×28 pixels = 784 dimensions.
Thus, each image can be represented as a vector \(\mathbf{x} \in \mathbb{R}^{784}\).

By applying PCA, we can compress and visualize the intrinsic lower-dimensional structure of this dataset.

Exercises

Put some exercises here.