8.1 Separating Hyperplanes
The key idea of SVMs is to separate data points of different classes using a hyperplane in \(\mathbb{R}^D\). A hyperplane divides the space into two regions, each corresponding to one of the two classes.
Definition 8.1 A hyperplane in \(\mathbb{R}^D\) is defined as: \[ \{ \mathbf{x} \in \mathbb{R}^D : f(\mathbf{x}) = 0 \}, \] where: \[ f(\mathbf{x}) = \langle \mathbf{w}, \mathbf{x} \rangle + b. \] Here:
- \(\mathbf{w} \in \mathbb{R}^D\) is the normal vector to the hyperplane, and
- \(b \in \mathbb{R}\) is the bias (intercept) term.
Any vector \(\mathbf{w}\) orthogonal to the hyperplane satisfies: \[ \langle \mathbf{w}, \mathbf{x}_a - \mathbf{x}_b \rangle = 0, \] for all points \(\mathbf{x}_a, \mathbf{x}_b\) lying on the hyperplane.
8.1.1 Classification Rule
A new example \(\mathbf{x}_{\text{test}}\) is classified based on the sign of \(f(\mathbf{x}_{\text{test}})\): \[ f(\mathbf{x}_{\text{test}}) = \begin{cases} +1, & \text{if } f(\mathbf{x}_{\text{test}}) \ge 0, \\ -1, & \text{if } f(\mathbf{x}_{\text{test}}) < 0. \end{cases} \]
Geometrically:
- Points with \(f(\mathbf{x}) > 0\) lie on the positive side of the hyperplane.
- Points with \(f(\mathbf{x}) < 0\) lie on the negative side.
8.1.2 Training Objective
During training, we want:
- Positive examples (\(y_n = +1\)) to be on the positive side:
\[
\langle \mathbf{w}, \mathbf{x}_n \rangle + b \ge 0,
\]
- Negative examples (\(y_n = -1\)) to be on the negative side:
\[
\langle \mathbf{w}, \mathbf{x}_n \rangle + b < 0.
\]
Both conditions can be combined compactly as: \[ y_n (\langle \mathbf{w}, \mathbf{x}_n \rangle + b) \ge 0. \] This equation expresses that all examples are correctly classified relative to the hyperplane defined by \((\mathbf{w}, b)\).
8.1.3 Geometric Interpretation
- The vector \(\mathbf{w}\) determines the orientation of the hyperplane.
- The scalar \(b\) shifts the hyperplane along \(\mathbf{w}\).
- Classification is performed by checking on which side of the hyperplane each example lies.
- The SVM’s goal is to find the hyperplane that maximizes the margin — the distance between the hyperplane and the nearest data points from each class.