8.1 Separating Hyperplanes

The key idea of SVMs is to separate data points of different classes using a hyperplane in \(\mathbb{R}^D\). A hyperplane divides the space into two regions, each corresponding to one of the two classes.

Definition 8.1 A hyperplane in \(\mathbb{R}^D\) is defined as: \[ \{ \mathbf{x} \in \mathbb{R}^D : f(\mathbf{x}) = 0 \}, \] where: \[ f(\mathbf{x}) = \langle \mathbf{w}, \mathbf{x} \rangle + b. \] Here:

\(\mathbf{w} \in \mathbb{R}^D\) is the normal vector to the hyperplane, and
\(b \in \mathbb{R}\) is the bias (intercept) term.

Any vector \(\mathbf{w}\) orthogonal to the hyperplane satisfies: \[ \langle \mathbf{w}, \mathbf{x}_a - \mathbf{x}_b \rangle = 0, \] for all points \(\mathbf{x}_a, \mathbf{x}_b\) lying on the hyperplane.

8.1.1 Classification Rule

A new example \(\mathbf{x}_{\text{test}}\) is classified based on the sign of \(f(\mathbf{x}_{\text{test}})\): \[ f(\mathbf{x}_{\text{test}}) = \begin{cases} +1, & \text{if } f(\mathbf{x}_{\text{test}}) \ge 0, \\ -1, & \text{if } f(\mathbf{x}_{\text{test}}) < 0. \end{cases} \]

Geometrically:

Points with \(f(\mathbf{x}) > 0\) lie on the positive side of the hyperplane.
Points with \(f(\mathbf{x}) < 0\) lie on the negative side.

8.1.2 Training Objective

During training, we want:

Positive examples (\(y_n = +1\)) to be on the positive side: \[ \langle \mathbf{w}, \mathbf{x}_n \rangle + b \ge 0, \]
Negative examples (\(y_n = -1\)) to be on the negative side: \[ \langle \mathbf{w}, \mathbf{x}_n \rangle + b < 0. \]
Both conditions can be combined compactly as: \[ y_n (\langle \mathbf{w}, \mathbf{x}_n \rangle + b) \ge 0. \] This equation expresses that all examples are correctly classified relative to the hyperplane defined by \((\mathbf{w}, b)\).

8.1.3 Geometric Interpretation

The vector \(\mathbf{w}\) determines the orientation of the hyperplane.
The scalar \(b\) shifts the hyperplane along \(\mathbf{w}\).
Classification is performed by checking on which side of the hyperplane each example lies.
The SVM’s goal is to find the hyperplane that maximizes the margin — the distance between the hyperplane and the nearest data points from each class.

Exercises

Put some exercises here.