Chapter 7 Density Estimation with Gaussian Mixture Models

In addition to regression and dimensionality reduction, density estimation is a core pillar of machine learning. Its goal is to represent data compactly by estimating the underlying probability density function that generated it.

Instead of storing all data points, density estimation models the data using a parametric family of distributions, such as a Gaussian.

For example, we can represent a dataset using its mean and variance, found via maximum likelihood (MLE) or maximum a posteriori (MAP) estimation. However, a single Gaussian distribution often provides a poor fit for complex or multimodal data (data with multiple clusters). To handle this, we use mixture models.

Definition 7.1 A mixture model represents a probability density as a convex combination of simpler component distributions: \[ p(\mathbf{x}) = \sum_{k=1}^{K} \pi_k p_k(\mathbf{x}), \] subject to: \[ 0 \leq \pi_k \leq 1, \quad \sum_{k=1}^{K} \pi_k = 1, \] where

\(p_k(\mathbf{x})\) are individual component distributions (e.g., Gaussian, Bernoulli), and
\(\pi_k\): mixture weights representing the contribution of each component

Mixture models can capture multimodal structures in data and are thus more expressive than single distributions.