# What are support vector machines with examples

Support Vector Machines are a common method for binary classification and regression. There is a large amount of resources online that attempt to explain how SVMs works, but few that include an example with actual numbers. In this section, I explain how it works with a concrete example that folks can compare their own calculations to in order to ensure they understand SVMs correctly.

#### Data setup

We will try to separate the points $$x_1=(2,1)$$ and $$x_2=(1,2)$$ by hand. To that end, we assign the labels $$1$$ and $$-1$$ as seen in the picture below.

#### Optimal separating hyperplane

Our goal is to find a hyperplane that separates these points in the best possible way. This means we try to find a hyperplane that leaves the largest margin between the closest points and the hyperplane. i.e. finding the **optimal separating hyperplane**.

We saw in the previous chapter that any hyperplane can be represented by

$$\begin{eqnarray} wx+b=0 \end{eqnarray}$$

**w**is normal to the hyperplane.- $$\frac{\textbf{b}}{\|\textbf{w}\|}$$ is the perpendicular distance from the hyperplane to the origin

Support Vectors are the points (vectors) closest to the separating hyperplane. In our case, we only have two vectors, which therefore will both be Support Vectors. In the above representation of hyperplanes, both $$\|\textbf{w}\|$$ and $$\textbf{b}$$ are unknowns. How do we retrieve them?

In the previous chapter, we deduced the so-called *dual form *. The dual-form contains an quadratic objective function with constraints. The dual form reads:

$$\begin{eqnarray} \underset{\alpha}{\max} \ L_D&=&\sum\limits^{L}_{i=1}\alpha_i -\tfrac{1}{2}\sum\limits_{i,j}\alpha_i\alpha_j y_i y_j\langle x_i,x_j\rangle \nonumber \\ \text{subject to}&\ & \sum\limits^{L}_{i=1}\alpha_i y_i=0\nonumber \\ \nonumber&\ & \alpha_i\geq 0 \ \forall i\end{eqnarray} $$

#### Solving the quadratic programming problem in order to retrieve $$\alpha_1$$ and $$\alpha_2$$

This boils down to solving a QP (Quadratic programming) problem.

Note: Quadratic Programming

**Quadratic programming (QP)** is a special kind of mathematical optimization problem. It is about trying to find a vector $$x$$ minimizing or maximizing an **objective function** subject to bounds, linear equality, and inequality constraints. $$\underset{x}{min}\big(\frac{1}{2}x^THx+f^Tx\big) (\text{objective function})\nonumber \\ \begin{eqnarray}Ax\leq b \ \ &&(\text{inequality constraint}) \nonumber \\ Ax= b \ \ &&(\text{equality constraint}) \nonumber \\ ab\leq x \leq ub \ \ &&(\text{bound constraint}) \nonumber \\\end{eqnarray}$$

Solving quadratic programming problems is not trivial, but there are many software packages containing solvers.

$$\begin{eqnarray} \underset{\alpha}{\max}\ L_D&=&\sum\limits^{L}_{i=1}\alpha_i -\tfrac{1}{2}\sum\limits_{i,j}\alpha_i\alpha_j y_i y_j\langle x_i,x_j\rangle \\ \nonumber &=& \alpha_1+\alpha_2 -\tfrac{1}{2}\Big(\alpha_1\alpha_1\cdot 1\cdot 1\cdot \langle \begin{pmatrix}2 \\ 1\end{pmatrix},\begin{pmatrix}2 \\ 1\end{pmatrix}\rangle +\\ \nonumber & & +2\cdot\alpha_1\alpha_2\cdot 1\cdot(-1)\cdot \langle \begin{pmatrix}1 \\ 2\end{pmatrix},\begin{pmatrix}2 \\ 1\end{pmatrix}\rangle+\\ \nonumber & & +\alpha_2\alpha_2\cdot(-1)\cdot(-1)\cdot \langle \begin{pmatrix}1 \\ 2\end{pmatrix},\begin{pmatrix}1 \\ 2\end{pmatrix}\rangle\Big) \\ \nonumber &=& \alpha_1+\alpha_2-\tfrac{1}{2}\big(5\alpha_1^2-8\alpha_1\alpha_2+5\alpha_2^2\big)\nonumber \\ \text{subject to} \ &\ &\alpha_1 y_1+\alpha_2 y_2=0 \nonumber\\ &\ & \alpha_1\geq 0, \ \alpha_2\geq 0\nonumber\end{eqnarray}$$

In order to solve this Quadratic Programming Problem, we make use of a solver that I made using Wolfram Alpha: It maximizes some objective function with respect to arbitrary objective functions. In the below widget, our optimization problem is already entered (note that $$\alpha_1=x$$ and $$\alpha_2=y$$).

The Lagrange-Multipliers solving our Quadratic Programming Problem $$(2)$$ are therefore $$\alpha_1=1$$ and $$\alpha_2=1$$ (the solution is listed under the paragraph “Global Maximum”).

#### Calculating $$w$$ and $$b$$

The bulk of the work is done now! What´s left to do now is to finally calculate **w **and **b**. Here´s how to do it:

$$\begin{eqnarray} w=\sum\limits^{L}_{i=1}\alpha_i y_i x_i&=&1\cdot 1\cdot\begin{pmatrix}2 \\ 1\end{pmatrix}+1\cdot (-1)\cdot \begin{pmatrix}1 \\ 2\end{pmatrix}\nonumber \\&=&\begin{pmatrix}2 \\ 1\end{pmatrix}-\begin{pmatrix}1 \\ 2\end{pmatrix}\nonumber \\ &=&\begin{pmatrix}1 \\ -1\end{pmatrix} \nonumber\end{eqnarray}$$

$$\begin{eqnarray} b=y_s-\sum\limits_{m\in S} \alpha_m y_m \langle x_m,x_s\rangle &=& 1-\big(1\cdot 1\cdot\langle\begin{pmatrix}2 \\ 1\end{pmatrix},\begin{pmatrix}2 \\ 1\end{pmatrix}\rangle \nonumber\\ &+& 1\cdot (-1)\cdot\langle\begin{pmatrix}2 \\ 1\end{pmatrix},\begin{pmatrix}1 \\ 2\end{pmatrix}\rangle\big)\nonumber \\ &=& 0 \nonumber\end{eqnarray}$$

Our hyperplane $$(1)$$, therefore, is:

\begin{equation} \begin{pmatrix}1 \\ -1\end{pmatrix}x=0 \nonumber\end{equation}

And that´s it! By the way, you can rewrite that by evaluating the left-hand side: $$\begin{equation}x_1-x_2 = 0 \Leftrightarrow x_1=x_2\end{equation}$$

Sharing is caring!

- Do quiet boys like loud girls
- Whats the axiom of transitivity
- When is bullying someone justified
- Is deli chicken bad for dogs
- Did De Valera have Collins killed
- Are maps accurate
- What are your most memorable AIESEC experiences
- What determines your individual principles of morality
- What are some adjectives to describe flowers
- How do Australians brush their teeth
- Why does communism thrive in poverty
- How is QBasic helpful to students
- How is the earth alive
- Can a CT scan show colon cancer
- How many popular Forex exchanges are there