7.4 Latent-Variable Perspective

A GMM can be interpreted as a latent-variable model where each data point \(\mathbf{x}_n\) is associated with a hidden variable \(z_n \in \{0,1\}\) indicating if the \(n^{th}\) mixture component generated it.

7.4.1 Generative Model

Each component \(k\) is selected according to the mixture probabilities \(\pi = [\pi_1, \dots, \pi_K]\), and data is sampled as:

  1. Sample \(z \sim \text{Categorical}(\pi)\)
  2. Sample \(\mathbf{x} \sim \mathcal{N}(\mu_k, \boldsymbol{\Sigma}_k)\) given \(z_k = 1.\)
    Thus, the joint distribution is: \[ p(\mathbf{x}, z_k = 1) = \pi_k \, \mathcal{N}(\mathbf{x} | \mu_k, \boldsymbol{\Sigma}_k). \]

7.4.2 Likelihood

The marginal likelihood is obtained by summing over latent states: \[ p(\mathbf{x} | \boldsymbol{\theta}) = \sum_{k=1}^{K} \pi_k \, \mathcal{N}(\mathbf{x} | \mu_k, \boldsymbol{\Sigma}_k) \]

For a dataset \(\mathcal{X} = \{\mathbf{x}_1, \dots, \mathbf{x}_N\}\), the total likelihood is: \[ p(\mathcal{X} | \boldsymbol{\theta}) = \prod_{n=1}^{N} \sum_{k=1}^{K} \pi_k \, \mathcal{N}(\mathbf{x}_n | \mu_k, \boldsymbol{\Sigma}_k). \] This formulation is identical to the GMM likelihood derived earlier.


7.4.3 Posterior Distribution

Using Bayes’ theorem, the posterior probability (responsibility) that component \(k\) generated \(\mathbf{x}_n\) is: \[ p(z_k = 1 | \mathbf{x}_n) = \frac{\pi_k \, \mathcal{N}(\mathbf{x}_n | \mu_k, \boldsymbol{\Sigma}_k)} {\sum_{j=1}^{K} \pi_j \, \mathcal{N}(\mathbf{x}_n | \mu_j, \boldsymbol{\Sigma}_j)} = r_{nk}. \] Thus, the responsibilities from the EM algorithm have a probabilistic interpretation as posterior probabilities.


7.4.4 Extension to Full Dataset

Each data point \(\mathbf{x}_n\) has its own latent variable \(z_n\), forming a set of hidden assignments: \[ z_n = [z_{n1}, \dots, z_{nK}]^{\top}. \] The same prior \(\pi\) applies to all, and the joint conditional distribution factorizes as: \[ p(\mathbf{x}_1, \dots, \mathbf{x}_N | z_1, \dots, z_N) = \prod_{n=1}^{N} p(\mathbf{x}_n | z_n). \] Responsibilities \(r_{nk}\) again represent \(p(z_{nk} = 1 | \mathbf{x}_n)\).


7.4.5 EM Algorithm Revisited

From the latent-variable perspective, EM can be derived as maximizing the expected complete-data log-likelihood: \[ Q(\boldsymbol{\theta} | \boldsymbol{\theta}^{(t)}) = \mathbb{E}_{z | \mathbf{x}, \boldsymbol{\theta}^{(t)}}[\log p(\mathbf{x}, z | \boldsymbol{\theta})]. \]

  • E-step: Compute the expected value of the log-likelihood under the posterior \(p(z | \mathbf{x}, \boldsymbol{\theta}^{(t)})\).
  • M-step: Maximize this expectation with respect to \(\boldsymbol{\theta}\).

Each iteration increases the log-likelihood but may converge to a local maximum, depending on initialization.


Exercises

Put some exercises here.