7.1 Gaussian Mixture Models (GMMs)

Definition 7.2 A Gaussian Mixture Model (GMM) combines multiple Gaussian distributions: \[ p(\mathbf{x} | \boldsymbol{\boldsymbol{\theta}}) = \sum_{k=1}^{K} \pi_k \, \mathcal{N}(\mathbf{x} | \boldsymbol{\boldsymbol{\mu}}_k, \boldsymbol{\boldsymbol{\Sigma}}_k), \] where:

  • \(\boldsymbol{\boldsymbol{\theta}} = \{\pi_k, \boldsymbol{\boldsymbol{\mu}}_k, \boldsymbol{\boldsymbol{\Sigma}}_k : k = 1, \dots, K\}\),
  • each \(\mathcal{N}(\mathbf{x} | \boldsymbol{\boldsymbol{\mu}}_k, \boldsymbol{\boldsymbol{\Sigma}}_k)\) is a Gaussian component, and
  • the mixture weights \(\pi_k\) satisfy \(\sum_{k=1}^{K} \pi_k = 1\).

This convex combination of Gaussians provides far greater flexibility for modeling complex or clustered data.

Example 7.1 A GMM represents a probability distribution as a weighted sum of Gaussian (normal) distributions.

Let \(x \in \mathbb{R}\) be a continuous random variable. A 2-component GMM is defined as: \[ p(x) = \pi_1 \, \mathcal{N}(x \mid \mu_1, \sigma_1^2) + \pi_2 \, \mathcal{N}(x \mid \mu_2, \sigma_2^2), \] where:

  • \(\pi_1, \pi_2 \ge 0\) are mixing coefficients
  • \(\pi_1 + \pi_2 = 1\)
  • \(\mathcal{N}(x \mid \mu_k, \sigma_k^2)\) is a Gaussian pdf

Let: \[ \pi_1 = 0.4, \quad \pi_2 = 0.6 \] \[ \mu_1 = -2, \quad \sigma_1^2 = 1 \] \[ \mu_2 = 3, \quad \sigma_2^2 = 2 \] Then the model becomes: \[ p(x) = 0.4 \, \mathcal{N}(x \mid -2, 1) + 0.6 \, \mathcal{N}(x \mid 3, 2) \]

So,

  • With probability 0.4, a data point is generated from a Gaussian centered at −2
  • With probability 0.6, a data point is generated from a Gaussian centered at 3
  • The overall distribution is bimodal, with two peaks.

Given an observed value \(x\), the probability that it came from component \(k\) is:

\[ \gamma_k(x) = p(z = k \mid x) = \frac{\pi_k \mathcal{N}(x \mid \mu_k, \sigma_k^2)} {\sum_{j=1}^{2} \pi_j \mathcal{N}(x \mid \mu_j, \sigma_j^2)} \] These are called responsibilities and are used in the Expectation–Maximization (EM) algorithm.

Example Calculation

Suppose we observe \(x = 0\).

  • Component 1 likelihood: \[ \mathcal{N}(0 \mid -2, 1) \approx \frac{0.1353}{2.5066} \approx 0.054. \]

  • Component 2 likelihood: \[ \mathcal{N}(0 \mid 3, 2) \approx \frac{0.1054}{3.5449} \approx 0.0297. \] After weighting by \(\pi_1\) and \(\pi_2\), we compute \(\gamma_1(0)\) and \(\gamma_2(0)\) to determine which Gaussian most likely generated the data point. \[\gamma_1(0) = \dfrac{0.4(0.054)}{0.4(0.054) + 0.6(0.0297)} = 0.548 \quad \gamma_2(0) = \dfrac{0.6(0.0297)}{0.4(0.054) + 0.6(0.0297)} = 0.452. \] Therefore, it is more likely that the point came from Gaussian 1.

Unlike linear regression or PCA, GMMs do not have a closed-form MLE solution. Instead, parameter estimation is achieved iteratively, most commonly using the Expectation-Maximization (EM) algorithm.


Exercises