What are support vector machines with examples

Support Vector Machines are a common method for binary classification and regression. There is a large amount of resources online that attempt to explain how SVMs works, but few that include an example with actual numbers. In this section, I explain how it works with a concrete example that folks can compare their own calculations to in order to ensure they understand SVMs correctly.

In a previous post, I went over the theory of SVMs in great detail. Even though this chapter works standalone as well, I recommend checking it out first.

Data setup

We will try to separate the points $$x_1=(2,1)$$ and $$x_2=(1,2)$$ by hand. To that end, we assign the labels $$1$$ and $$-1$$ as seen in the picture below.

Optimal separating hyperplane

Our goal is to find a hyperplane that separates these points in the best possible way. This means we try to find a hyperplane that leaves the largest margin between the closest points and the hyperplane.  i.e. finding the optimal separating hyperplane.
We saw in the previous chapter that any hyperplane can be represented by

$$\begin{eqnarray} wx+b=0 \end{eqnarray}$$

  • is normal to the hyperplane.
  • $$\frac{\textbf{b}}{\|\textbf{w}\|}$$ is the perpendicular distance from the hyperplane to the origin

Support Vectors are the points (vectors) closest to the separating hyperplane. In our case, we only have two vectors, which therefore will both be Support Vectors. In the above representation of hyperplanes, both $$\|\textbf{w}\|$$ and $$\textbf{b}$$ are unknowns. How do we retrieve them?

In the previous chapter, we deduced the so-called dual form . The dual-form contains an quadratic objective function with constraints. The dual form reads:

$$\begin{eqnarray} \underset{\alpha}{\max} \ L_D&=&\sum\limits^{L}_{i=1}\alpha_i -\tfrac{1}{2}\sum\limits_{i,j}\alpha_i\alpha_j y_i y_j\langle x_i,x_j\rangle \nonumber \\ \text{subject to}&\ & \sum\limits^{L}_{i=1}\alpha_i y_i=0\nonumber \\ \nonumber&\ & \alpha_i\geq 0 \ \forall i\end{eqnarray} $$

Solving the quadratic programming problem in order to retrieve $$\alpha_1$$ and $$\alpha_2$$

This boils down to solving a QP (Quadratic programming) problem.

Note: Quadratic Programming

Quadratic programming (QP) is a special kind of mathematical optimization problem. It is about trying to find a vector $$x$$ minimizing or maximizing an objective function subject to bounds, linear equality, and inequality constraints. $$\underset{x}{min}\big(\frac{1}{2}x^THx+f^Tx\big) (\text{objective function})\nonumber \\ \begin{eqnarray}Ax\leq b \ \ &&(\text{inequality constraint}) \nonumber \\ Ax= b \ \ &&(\text{equality constraint}) \nonumber \\ ab\leq x \leq ub \ \ &&(\text{bound constraint}) \nonumber \\\end{eqnarray}$$
Solving quadratic programming problems is not trivial, but there are many software packages containing solvers.

Let´s insert our numbers into $$L_D$$:
$$\begin{eqnarray} \underset{\alpha}{\max}\ L_D&=&\sum\limits^{L}_{i=1}\alpha_i -\tfrac{1}{2}\sum\limits_{i,j}\alpha_i\alpha_j y_i y_j\langle x_i,x_j\rangle \\ \nonumber &=& \alpha_1+\alpha_2 -\tfrac{1}{2}\Big(\alpha_1\alpha_1\cdot 1\cdot 1\cdot \langle \begin{pmatrix}2 \\ 1\end{pmatrix},\begin{pmatrix}2 \\ 1\end{pmatrix}\rangle +\\ \nonumber & & +2\cdot\alpha_1\alpha_2\cdot 1\cdot(-1)\cdot \langle \begin{pmatrix}1 \\ 2\end{pmatrix},\begin{pmatrix}2 \\ 1\end{pmatrix}\rangle+\\ \nonumber & & +\alpha_2\alpha_2\cdot(-1)\cdot(-1)\cdot \langle \begin{pmatrix}1 \\ 2\end{pmatrix},\begin{pmatrix}1 \\ 2\end{pmatrix}\rangle\Big) \\ \nonumber &=& \alpha_1+\alpha_2-\tfrac{1}{2}\big(5\alpha_1^2-8\alpha_1\alpha_2+5\alpha_2^2\big)\nonumber \\ \text{subject to} \ &\ &\alpha_1 y_1+\alpha_2 y_2=0 \nonumber\\ &\ & \alpha_1\geq 0, \ \alpha_2\geq 0\nonumber\end{eqnarray}$$

In order to solve this Quadratic Programming Problem, we make use of a solver that I made using Wolfram Alpha: It maximizes some objective function with respect to arbitrary objective functions. In the below widget, our optimization problem is already entered (note that $$\alpha_1=x$$ and $$\alpha_2=y$$).

The Lagrange-Multipliers solving our Quadratic Programming Problem $$(2)$$ are therefore $$\alpha_1=1$$ and $$\alpha_2=1$$ (the solution is listed under the paragraph “Global Maximum”).

Calculating $$w$$ and $$b$$

The bulk of the work is done now! What´s left to do now is to finally calculate w and b. Here´s how to do it:
$$\begin{eqnarray} w=\sum\limits^{L}_{i=1}\alpha_i y_i x_i&=&1\cdot 1\cdot\begin{pmatrix}2 \\ 1\end{pmatrix}+1\cdot (-1)\cdot \begin{pmatrix}1 \\ 2\end{pmatrix}\nonumber \\&=&\begin{pmatrix}2 \\ 1\end{pmatrix}-\begin{pmatrix}1 \\ 2\end{pmatrix}\nonumber \\ &=&\begin{pmatrix}1 \\ -1\end{pmatrix} \nonumber\end{eqnarray}$$
$$\begin{eqnarray} b=y_s-\sum\limits_{m\in S} \alpha_m y_m \langle x_m,x_s\rangle &=& 1-\big(1\cdot 1\cdot\langle\begin{pmatrix}2 \\ 1\end{pmatrix},\begin{pmatrix}2 \\ 1\end{pmatrix}\rangle \nonumber\\ &+& 1\cdot (-1)\cdot\langle\begin{pmatrix}2 \\ 1\end{pmatrix},\begin{pmatrix}1 \\ 2\end{pmatrix}\rangle\big)\nonumber \\ &=& 0 \nonumber\end{eqnarray}$$
Our hyperplane $$(1)$$, therefore, is:
\begin{equation} \begin{pmatrix}1 \\ -1\end{pmatrix}x=0 \nonumber\end{equation}
And that´s it! By the way, you can rewrite that by evaluating the left-hand side: $$\begin{equation}x_1-x_2 = 0 \Leftrightarrow x_1=x_2\end{equation}$$

Sharing is caring!