6.5 PCA in High Dimensions
For high-dimensional data, computing the \(D \times D\) covariance matrix and its eigen-decomposition is computationally expensive, scaling cubically with \(D\). When the number of samples is much smaller than the number of features (\(N \ll D\)), most eigenvalues are zero, and the covariance matrix \(\mathbf{S}\) has rank \(N\).
6.5.1 Dimensionality Reduction Trick for \(N \ll D\)
Given centered data \(\mathbf{X} = [\mathbf{x}_1, \dots, \mathbf{x}_N] \in \mathbb{R}^{D \times N}\), the covariance matrix is: \[ \mathbf{S} = \frac{1}{N} \mathbf{X} \mathbf{X}^\top. \] The eigenvector equation is: \[ \mathbf{S} \mathbf{b}_m = \lambda_m \mathbf{b}_m. \] Multiplying by \(\mathbf{X}^\top\) gives: \[ \frac{1}{N} \mathbf{X}^\top \mathbf{X} \mathbf{c}_m = \lambda_m \mathbf{c}_m, \quad \text{where } \mathbf{c}_m = \mathbf{X}^\top \mathbf{b}_m. \] Thus:
- \(\mathbf{X}^\top \mathbf{X} / N\) (size \(N \times N\)) shares the same nonzero eigenvalues as \(\mathbf{S}\).
- We can compute eigenvectors more efficiently using this smaller matrix.
To recover the original eigenvectors of \(\mathbf{S}\): \[ \mathbf{b}_m = \mathbf{X} \mathbf{c}_m. \] Finally, normalize \(\mathbf{b}_m\) to have unit norm for use in PCA.