6.7 Latent Variable Perspective
While PCA can be derived geometrically or algebraically, it can also be viewed probabilistically using a latent variable model.
6.7.1 Probabilistic PCA (PPCA)
PPCA (Tipping & Bishop, 1999) introduces a latent variable \(\mathbf{z} \in \mathbb{R}^M\) with: \[ p(\mathbf{z}) = \mathcal{N}(\mathbf{z} | 0, \mathbf{I}), \] and a linear mapping to the observed data: \[ \mathbf{x} = \mathbf{Bz} + \boldsymbol{\mu} + \boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(0, \sigma^2 \mathbf{I}). \] Thus: \[ p(\mathbf{x} | \mathbf{z}, \mathbf{B}, \boldsymbol{\mu}, \sigma^2) = \mathcal{N}(\mathbf{x} | \mathbf{Bz} + \boldsymbol{\mu}, \sigma^2 \mathbf{I}). \] The joint distribution is: \[ p(\mathbf{x}, \mathbf{z} | \mathbf{B}, \boldsymbol{\mu}, \sigma^2) = p(\mathbf{x} | \mathbf{z}, \mathbf{B}, \boldsymbol{\mu}, \sigma^2)p(\mathbf{z}). \] This defines the generative process:
- Sample \(\mathbf{z} \sim \mathcal{N}(0, \mathbf{I})\)
- Sample \(\mathbf{x} \sim \mathcal{N}(\mathbf{Bz} + \boldsymbol{\mu}, \sigma^2 \mathbf{I})\)
6.7.2 Likelihood and Covariance Structure
By integrating out \(\mathbf{z}\): \[ p(\mathbf{x} | \mathbf{B}, \boldsymbol{\mu}, \sigma^2) = \int p(\mathbf{x} | \mathbf{z}, \mathbf{B}, \boldsymbol{\mu}, \sigma^2)p(\mathbf{z})\, d\mathbf{z} = \mathcal{N}(\mathbf{x} | \boldsymbol{\mu}, \mathbf{B}\mathbf{B}^\top + \sigma^2 \mathbf{I}). \]
- Mean: \(\mathbb{E}[\mathbf{x}] = \boldsymbol{\mu}\)
- Covariance: \(\text{Var}[\mathbf{x}] = \mathbf{B}\mathbf{B}^\top + \sigma^2 \mathbf{I}\)
Hence, the observed data has covariance derived from both the latent structure and the noise.
6.7.3 Posterior Distribution
Given an observation \(\mathbf{x}\), the posterior over latent variables is: \[ p(\mathbf{z} | \mathbf{x}) = \mathcal{N}(\mathbf{z} | \mathbf{m}, \mathbf{C}), \] with: \[ \mathbf{m} = \mathbf{B}^\top (\mathbf{B}\mathbf{B}^\top + \sigma^2 \mathbf{I})^{-1}(\mathbf{x} - \boldsymbol{\mu}), \] \[ \mathbf{C} = \mathbf{I} - \mathbf{B}^\top (\mathbf{B}\mathbf{B}^\top + \sigma^2 \mathbf{I})^{-1} \mathbf{B}. \]
The covariance \(\mathbf{C}\) represents the uncertainty of the latent embedding:
- Small determinant → confident embedding
- Large determinant → uncertain or outlier point
To visualize or reconstruct data:
- Sample \(\mathbf{z}_\ast \sim p(\mathbf{z} | \mathbf{x}_\ast)\)
- Generate \(\tilde{\mathbf{x}}_\ast \sim p(\mathbf{x} | \mathbf{z}_\ast, \mathbf{B}, \boldsymbol{\mu}, \sigma^2)\).
This process allows data generation and exploration of latent structure, forming a bridge between classical PCA and modern generative models.