6.4 Eigenvector Computation and Low-Rank Approximations
6.4.1 Computing Eigenvectors via Covariance and SVD
The principal subspace basis of PCA is obtained from the eigenvectors corresponding to the largest eigenvalues of the data covariance matrix: \[ \mathbf{S} = \frac{1}{N} \sum_{n=1}^N \mathbf{x}_n \mathbf{x}_n^\top = \frac{1}{N} \mathbf{X} \mathbf{X}^\top, \quad \mathbf{X} \in \mathbb{R}^{D \times N}. \] Two equivalent approaches exist:
Eigen-decomposition of \(\mathbf{S}\) - Compute eigenvalues and eigenvectors directly from \(\mathbf{S}\).
Singular Value Decomposition (SVD) of \(\mathbf{X}\)
\[ \mathbf{X} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^\top. \]- \(\mathbf{U} \in \mathbb{R}^{D \times D}\) and \(\mathbf{V} \in \mathbb{R}^{N \times N}\) are orthogonal.
- \(\boldsymbol{\Sigma} \in \mathbb{R}^{D \times N}\) holds the singular values \(\sigma_i \ge 0\).
- \(\mathbf{U} \in \mathbb{R}^{D \times D}\) and \(\mathbf{V} \in \mathbb{R}^{N \times N}\) are orthogonal.
From this decomposition: \[ \mathbf{S} = \frac{1}{N} \mathbf{U} \boldsymbol{\Sigma} \boldsymbol{\Sigma}^\top \mathbf{U}^\top. \] Hence:
- Columns of \(\mathbf{U}\) are eigenvectors of \(\mathbf{S}\).
- Eigenvalues of \(\mathbf{S}\) are related to singular values by: \[ \lambda_d = \frac{\sigma_d^2}{N}. \] This connects the maximum variance view of PCA with the SVD perspective.
6.4.2 PCA via Low-Rank Matrix Approximations
PCA can be viewed as finding the best rank-\(M\) approximation to \(\mathbf{X}\) that minimizes reconstruction error.
Theorem 6.2 Eckart–Young theorem: \[ \tilde{\mathbf{X}}_M = \arg\min_{\text{rank}(A) \le M} \| \mathbf{X} - A \|_2. \]
The optimal low-rank approximation is obtained by truncating the SVD: \[ \tilde{\mathbf{X}}_M = \mathbf{U}_M \boldsymbol{\Sigma}_M \mathbf{V}_M^\top, \] where:
- \(\mathbf{U}_M = [\mathbf{u}_1, \dots, \mathbf{u}_M] \in \mathbb{R}^{D \times M}\)
- \(\mathbf{V}_M = [\mathbf{v}_1, \dots, \mathbf{v}_M] \in \mathbb{R}^{N \times M}\)
- \(\boldsymbol{\Sigma}_M \in \mathbb{R}^{M \times M}\) contains the top \(M\) singular values.
This provides the optimal projection matrix for PCA and directly yields the principal components.
6.4.3 Practical Aspects of Eigenvalue Computation
While eigenvalues can theoretically be found by solving the characteristic polynomial, the Abel–Ruffini theorem states that general algebraic solutions for polynomials of degree ≥ 5 do not exist.
In practice:
- Numerical methods (e.g.,
np.linalg.eigh,np.linalg.svd) compute eigenvalues iteratively.
- When only a few leading eigenvectors are needed, iterative methods are more efficient than full decompositions.
A simple iterative algorithm to compute the dominant eigenvector: \[ \mathbf{x}_{k+1} = \frac{\mathbf{S} \mathbf{x}_k}{\| \mathbf{S} \mathbf{x}_k \|}. \] This converges to the eigenvector corresponding to the largest eigenvalue. The original Google PageRank algorithm used a similar approach.