3.3 Convex Optimization

Convex optimization focuses on a special class of optimization problems where global optimality can be guaranteed. These problems occur when the objective function is convex and the constraints define convex sets. In this setting, strong duality holds — meaning the optimal values of the primal and dual problems are equal.


3.3.1 Convex Sets and Functions

Definition 3.6 A convex set \(C\) is one in which the line segment between any two points lies entirely within the set: \[ \theta x + (1 - \theta)y \in C, \quad \text{for all } x, y \in C \text{ and } 0 \le \theta \le 1. \]

Example 3.10 A Convex Set in \(\mathbb{R}^2\)

Consider the set \[ C = \left\{ (x,y) \in \mathbb{R}^2 \;\middle|\; x^2 + y^2 \le 4 \right\}, \] which is the closed disk of radius 2 centered at the origin. We will demonstrate that the set is convex using two example points. A complete proof requires selecting arbitrary points in \(C\).

Let \[ \mathbf{x} = (1,1), \qquad \mathbf{y} = (-1,0). \] For any \(0 \le \theta \le 1\), consider the line connection the points \[ \theta \mathbf{x} + (1-\theta)\mathbf{y} = \theta(1,1) + (1-\theta)(-1,0) = (2\theta - 1,\; \theta). \] We compute the squared norm: \[ (2\theta - 1)^2 + \theta^2 = 4\theta^2 - 4\theta + 1 + \theta^2 = 5\theta^2 - 4\theta + 1. \] For \(0 \le \theta \le 1\), \[ 5\theta^2 - 4\theta + 1 \le 2 < 4. \] Thus, \[ (2\theta - 1,\; \theta) \in C \quad \text{for all } 0 \le \theta \le 1. \] Since every convex combination of any two points in \(C\) remains in \(C\), we know that it is a convex set. Geometrically, this means every line segment between two points inside the disk stays entirely inside the disk, illustrating the definition of convexity for a set.

Definition 3.7 A convex function \(f: \mathbb{R}^D \to \mathbb{R}\) satisfies: \[ f(\theta x + (1 - \theta)y) \le \theta f(x) + (1 - \theta) f(y), \] which means a straight line between two points on the function lies above the function.

Example 3.11 A Convex Function in \(\mathbb{R}^2\)

Consider the function \[ f(x, y) = x^2 + y^2, \] defined on \(\mathbb{R}^2\). This function is a paraboloid, and we will verify with an example that it satisfies the definition of convexity. A more general proof would use arbitrary points in \(\mathbb{R}^2\).

Let \[ \mathbf{x} = (1, 2), \qquad \mathbf{y} = (-1, 0). \] Then \[ f(\mathbf{x}) = 1^2 + 2^2 = 5, \qquad f(\mathbf{y}) = (-1)^2 + 0^2 = 1. \] For \(0 \le \theta \le 1\), \[ \theta \mathbf{x} + (1-\theta)\mathbf{y} = \big(2\theta - 1,\; 2\theta\big). \] At this point, the function evaluates to \[ \begin{aligned} f(\theta \mathbf{x} + (1-\theta)\mathbf{y}) &= (2\theta - 1)^2 + (2\theta)^2 \\ &= 4\theta^2 - 4\theta + 1 + 4\theta^2 \\ &= 8\theta^2 - 4\theta + 1. \end{aligned} \]

The weighted average of the function values is \[ \theta f(\mathbf{x}) + (1-\theta)f(\mathbf{y}) = \theta(5) + (1-\theta)(1) = 4\theta + 1. \]

We compare: \[ f(\theta \mathbf{x} + (1-\theta)\mathbf{y}) \le \theta f(\mathbf{x}) + (1-\theta)f(\mathbf{y}) \] This becomes: \[ 8\theta^2 - 4\theta + 1 \le 4\theta + 1, \] or equivalently, \[ 8\theta^2 - 8\theta \le 0 \quad \Longleftrightarrow \quad 8\theta(\theta - 1) \le 0. \] This inequality holds for all \(0 \le \theta \le 1\). So, \(f\) is a convex function.

Geometrically, this confirms that the straight line connecting two points on the surface of the paraboloid lies above the surface itself, exactly matching the definition of a convex function.

A concave function is simply the negative of a convex function.

Convexity can also be checked using gradients or Hessians:

  • If \(f\) is differentiable, it is convex if
    \[ f(y) \ge f(x) + \nabla f(x)^{\top}(y - x). \]
  • If \(f\) is twice differentiable, convexity holds when
    \[ \nabla^2 f(x) \text{ is positive semidefinite for all } x. \]

The epigraph of a convex function — the region lying above its graph — is itself a convex set.

Example 3.12 Negative entropy: \(f(x) = x \log_2 x\) is convex for \(x > 0\).

Example 3.13 Closure property: A nonnegative weighted sum of convex functions is convex.
For \(\alpha, \beta \ge 0\), if \(f_1\) and \(f_2\) are convex, then
\[ \alpha f_1(x) + \beta f_2(x) \text{ is also convex.} \] This is related to Jensen’s inequality.


3.3.2 General Convex Optimization Problem

Definition 3.8 A convex optimization problem has the form: \[ \begin{aligned} \min_x \quad & f(x) \\ \text{subject to} \quad & g_i(x) \le 0, \quad i = 1, \ldots, m, \\ & h_j(x) = 0, \quad j = 1, \ldots, n, \end{aligned} \] where \(f\) and all \(g_i\) are convex functions, and \(h_j\) define affine (convex) sets.

Example 3.14 Minimize the function \[ f(x, y) = x^2 + y^2 - 4x - 6y \] subject to the constraint \[ x + y = 4. \]

The objective function \(f(x,y)\) is convex because it is a quadratic function with a positive definite Hessian: \[ \nabla^2 f = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} \succ 0. \] The constraint \(x + y = 4\) is affine, hence convex. Therefore, this is a convex optimization problem, and any local minimum is a global minimum.

We can use the constraint to eliminate one variable: \[ y = 4 - x. \] Substitute into \(f\): \[ \begin{aligned} f(x, 4-x) &= x^2 + (4-x)^2 - 4x - 6(4-x) \\ &= x^2 + 16 - 8x + x^2 - 4x - 24 + 6x \\ &= 2x^2 - 6x - 8. \end{aligned} \]

Differentiate with respect to \(x\): \[ \frac{d}{dx}(2x^2 - 6x - 8) = 4x - 6. \] Set equal to zero: \[ 4x - 6 = 0 \quad \Rightarrow \quad x = \frac{3}{2}. \] Find \(y\): \[ y = 4 - \frac{3}{2} = \frac{5}{2}. \] The minimum value is \[ \begin{aligned} f\!\left(\frac{3}{2}, \frac{5}{2}\right) &= \left(\frac{3}{2}\right)^2 + \left(\frac{5}{2}\right)^2 - 4\left(\frac{3}{2}\right) - 6\left(\frac{5}{2}\right) \\ &= \frac{9}{4} + \frac{25}{4} - 6 - 15 \\ &= \frac{34}{4} - 21 \\ &= -\frac{25}{2}. \end{aligned} \] This solution is the unique global minimizer due to convexity.

Definition 3.9 A linear program (LP) has a linear objective and linear constraints: \[ \begin{aligned} \min_{\mathbf{x} \in \mathbb{R}^d} \quad & \mathbf{c}^{\top}\mathbf{x} \\ \text{subject to} \quad & \mathbf{A}\mathbf{x} \le \mathbf{b}. \end{aligned} \] Here \(\mathbf{A} \in \mathbb{R}^{m \times d}\) and \(\mathbf{b} \in \mathbb{R}^m\).
The dual problem is also a linear program: \[ \begin{aligned} \max_{\mathbf{\lambda} \ge 0} \quad & -\mathbf{b}^{\top}\mathbf{\lambda} \\ \text{subject to} \quad & \mathbf{c} + \mathbf{A}^{\top}\mathbf{\lambda} = 0. \end{aligned} \] Depending on whether \(d\) (number of variables) or \(m\) (number of constraints) is larger, we may solve the primal or dual form.

Example 3.15 Linear Programming with Primal, Dual, and Solution

Consider the linear program: \[ \begin{aligned} \text{maximize } \quad & z = 3x + 2y \\ \text{subject to } \quad & x + y \le 4, \\ & 2x + y \le 5, \\ & x \ge 0,\; y \ge 0. \end{aligned} \] This is a linear program because both the objective function and the constraints are linear.

The original problem is the primal problem. The feasible region, defined by the constraints (the corner points), is:

  • \((0,0)\)
  • \((0,4)\) from \(x+y=4\)
  • \((2.5,0)\) from \(2x+y=5\)
  • Intersection of \(x+y=4\) and \(2x+y=5\)

Solve the intersection: \[ \begin{aligned} x + y &= 4 \\ 2x + y &= 5 \end{aligned} \quad \Rightarrow \quad x = 1,\; y = 3. \] We evaluate the objective function at each corner:

Point \(z = 3x + 2y\)
\((0,0)\) 0
\((0,4)\) 8
\((2.5,0)\) 7.5
\((1,3)\) 9

Therefore, the optimal solution is \[ x^* = 1,\quad y^* = 3,\quad z^* = 9 \]

Writing the primal in matrix form: let \[ \mathbf{x} = \begin{bmatrix} x \\ y \end{bmatrix}, \quad \mathbf{c} = \begin{bmatrix} 3 \\ 2 \end{bmatrix}, \quad \mathbf{A} = \begin{bmatrix} 1 & 1 \\ 2 & 1 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 4 \\ 5 \end{bmatrix}. \] Then the primal problem is: \[ \begin{aligned} \text{maximize } & \mathbf{c}^\top \mathbf{x} \\ \text{subject to } & \mathbf{A}\mathbf{x} \le \mathbf{b}, \\ & \mathbf{x} \ge 0. \end{aligned} \]

For a primal of the form \[ \max \{ \mathbf{c}^\top \mathbf{x} : \mathbf{A}\mathbf{x} \le \mathbf{b},\; \mathbf{x} \ge 0 \}, \] the dual is: \[ \min \{ \mathbf{b}^\top \mathbf{y} : \mathbf{A}^\top \mathbf{y} \ge \mathbf{c},\; \mathbf{y} \ge 0 \}. \]

Let \(\mathbf{y} = (y_1, y_2)^\top\). The dual is: \[ \begin{aligned} \text{minimize } \quad & 4y_1 + 5y_2 \\ \text{subject to } \quad & y_1 + 2y_2 \ge 3, \\ & y_1 + y_2 \ge 2, \\ & y_1 \ge 0,\; y_2 \ge 0. \end{aligned} \]

Check the intersection where both constraints are tight: \[ \begin{aligned} y_1 + 2y_2 &= 3 \\ y_1 + y_2 &= 2 \end{aligned} \quad \Rightarrow \quad y_1 = 1,\; y_2 = 1. \]

Compute the objective value: \[ 4(1) + 5(1) = 9. \] Therefore, \[ y_1^* = 1,\quad y_2^* = 1,\quad \text{minimum value} = 9 \]

The primal and dual have the same optimal value.


3.3.3 Quadratic Programming

Definition 3.10 A quadratic program (QP) minimizes a convex quadratic function under affine constraints: \[ \begin{aligned} \min_{x \in \mathbb{R}^d} \quad & \tfrac{1}{2}x^{\top}Qx + c^{\top}x \\ \text{subject to} \quad & Ax \le b, \end{aligned} \] where \(Q\) is positive definite, ensuring convexity.

Example 3.16 Consider the quadratic program: \[ \begin{aligned} \text{minimize } \quad & f(x, y) = x^2 + y^2 \\ \text{subject to } \quad & x + y = 1. \end{aligned} \]

This is a quadratic program because:

  • The objective function is quadratic.
  • The constraint is linear.

The objective can be written in matrix form as: \[ f(\mathbf{x}) = \mathbf{x}^\top \mathbf{Q}\mathbf{x}, \quad \mathbf{x} = \begin{bmatrix} x \\ y \end{bmatrix}, \quad \mathbf{Q} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}. \] Since \(\mathbf{Q}\) is positive definite, this is a convex quadratic program, and any local minimum is a global minimum.

Form the Lagrangian: \[ \mathcal{L}(x,y,\lambda) = x^2 + y^2 + \lambda(x + y - 1). \] Take partial derivatives: \[ \begin{aligned} \frac{\partial \mathcal{L}}{\partial x} &= 2x + \lambda = 0, \\ \frac{\partial \mathcal{L}}{\partial y} &= 2y + \lambda = 0, \\ \frac{\partial \mathcal{L}}{\partial \lambda} &= x + y - 1 = 0. \end{aligned} \] From the first two equations: \[ 2x = 2y \quad \Rightarrow \quad x = y. \] Substitute into the constraint: \[ x + x = 1 \quad \Rightarrow \quad x = \frac{1}{2}. \] Thus: \[ x^* = \frac{1}{2}, \quad y^* = \frac{1}{2}. \] This gives an optimal solution of \[ f\!\left(\frac{1}{2}, \frac{1}{2}\right) = \left(\frac{1}{2}\right)^2 + \left(\frac{1}{2}\right)^2 = \frac{1}{2}. \]

Definition 3.11 The dual form of the QP is: \[ \begin{aligned} \max_{\mathbf{\lambda} \ge 0} \quad & -\tfrac{1}{2}(c + A^{\top}\mathbf{\lambda})^{\top}Q^{-1}(c + A^{\top}\mathbf{\lambda}) - b^{\top}\mathbf{\lambda}. \end{aligned} \]

Example 3.17 We revisit the quadratic program: \[ \begin{aligned} \text{minimize } \quad & f(x,y) = x^2 + y^2 \\ \text{subject to } \quad & x + y = 1. \end{aligned} \]

In vector form, let \[ \mathbf{x} = \begin{bmatrix} x \\ y \end{bmatrix}, \quad \mathbf{Q} = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}, \quad \mathbf{A} = \begin{bmatrix} 1 & 1 \end{bmatrix}, \quad \mathbf{b} = 1. \]

Then the primal problem is: \[ \begin{aligned} \text{minimize } \quad & \frac{1}{2}\mathbf{x}^\top \mathbf{Q}\mathbf{x} \\ \text{subject to } \quad & \mathbf{A}\mathbf{x} = \mathbf{b}. \end{aligned} \]

Introduce a Lagrange multiplier \(\lambda \in \mathbb{R}\). The Lagrangian is: \[ \mathcal{L}(\mathbf{x}, \lambda) = \frac{1}{2}\mathbf{x}^\top \mathbf{Q}\mathbf{x} + \lambda(\mathbf{A}\mathbf{x} - \mathbf{b}). \] The dual function is obtained by minimizing the Lagrangian with respect to \(\mathbf{x}\): \[ g(\lambda) = \inf_{\mathbf{x}} \mathcal{L}(\mathbf{x}, \lambda). \] Take the gradient with respect to \(\mathbf{x}\): \[ \nabla_{\mathbf{x}} \mathcal{L} = \mathbf{Q}\mathbf{x} + \mathbf{A}^\top \lambda. \] Setting this to zero gives: \[ \mathbf{x} = -\mathbf{Q}^{-1}\mathbf{A}^\top \lambda. \] Since \[ \mathbf{Q}^{-1} = \begin{bmatrix} \frac{1}{2} & 0 \\ 0 & \frac{1}{2} \end{bmatrix}, \quad \mathbf{A}^\top = \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \] we obtain: \[ \mathbf{x} = -\frac{\lambda}{2} \begin{bmatrix} 1 \\ 1 \end{bmatrix}. \] Substitute this expression for \(\mathbf{x}\) into the Lagrangian: \[ \begin{aligned} g(\lambda) &= \frac{1}{2}\mathbf{x}^\top \mathbf{Q}\mathbf{x} + \lambda(\mathbf{A}\mathbf{x} - \mathbf{b}) \\ &= -\frac{1}{4}\lambda^2 - \lambda. \end{aligned} \] Thus, the dual problem is: \[ \boxed{ \begin{aligned} \text{maximize } \quad & g(\lambda) = -\frac{1}{4}\lambda^2 - \lambda \\ \text{subject to } \quad & \lambda \in \mathbb{R}. \end{aligned} } \]

To solve the dual, differentiate: \[ \frac{d}{d\lambda} g(\lambda) = -\frac{1}{2}\lambda - 1. \] Set to zero: \[ -\frac{1}{2}\lambda - 1 = 0 \quad \Rightarrow \quad \lambda^* = -2. \] Using \[ \mathbf{x} = -\mathbf{Q}^{-1}\mathbf{A}^\top \lambda, \] we obtain: \[ \mathbf{x}^* = -\frac{-2}{2} \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2} \end{bmatrix}. \]

Therefore, the primal optimal solution is \[ \mathbf{x}^* = \begin{bmatrix} \frac{1}{2} \\ \frac{1}{2} \end{bmatrix}, \quad f^* = \frac{1}{2}. \] The dual optimal solution is \[ \lambda^* = -2. \] Strong duality holds because the problem is convex with linear equality constraints.

Quadratic programs are common in machine learning — for example, in support vector machines.


3.3.4 Legendre–Fenchel Transform and Convex Conjugate

Convex functions can also be characterized using their supporting hyperplanes (tangents). The Legendre–Fenchel transform (or convex conjugate) formalizes this relationship.

Definition 3.12 For a function \(f: \mathbb{R}^D \to \mathbb{R}\), the convex conjugate is defined as: \[ f^*(s) = \sup_{x \in \mathbb{R}^D} \big( \langle s, x \rangle - f(x) \big). \]

If \(f\) is convex and differentiable, the supremum occurs where \(s = \nabla f(x)\), and we have: \[ f^*(s) = s^{\top}x - f(x). \]

For convex functions, applying the transform twice returns the original function: \[ f^{**}(x) = f(x). \]

3.3.5 Connection to Duality

Using the Legendre–Fenchel transform, a general equality-constrained convex optimization problem: \[ \min_x f(Ax) + g(x) \] can be expressed equivalently as a dual problem: \[ \max_u -f^*(u) - g^*(-A^{\top}u). \] This relationship shows how convex conjugates naturally lead to the formulation of dual problems in optimization.


Exercises

Exercise 3.11 Consider the linear program \[ \min_{\mathbf{x} \in \mathbb{R}^2} -\begin{bmatrix}5\\3 \end{bmatrix}^T\begin{bmatrix}x_1\\x_2 \end{bmatrix} \;\;\;\;\; \text{subject to} \;\;\;\;\; \begin{bmatrix} 2 & 2 \\ 2 & -4 \\ -2 & 1 \\ 0 & -1 \\ 0 & 1 \end{bmatrix}\begin{bmatrix}x_1 \\ x_2 \end{bmatrix} \leq \begin{bmatrix} 33 \\ 8 \\ -5 \\ -1 \\ 8 \end{bmatrix}.\] Solve this problem showing your work. Derive the dual using Lagrange duality.

Exercise 3.12 Consider the linear program \[\begin{align*} \text{maximize } & P = 5x + 12y\\ \text{subject to } & 20x + 10y \leq 200\\ & 10x + 20 y \leq 120\\ & 10x + 30y \leq 150\\ &x \geq 0\\ &y \geq 0. \end{align*}\] Illustrate the problem. Find the optimal solution. Recreate this problem in matrix form. Derive the dual using Lagrange duality.

Exercise 3.13 Consider the linear program \[\begin{align*} \text{maximize } & P = 100x + 120y\\ \text{subject to } & x + y \leq 220\\ & 30x + 20y \leq 480\\ & x + 2y \leq 36\\ &x \geq 0\\ &y \geq 0. \end{align*}\] Illustrate the problem. Find the optimal solution. Recreate this problem in matrix form. Derive the dual using Lagrange duality.