Trending Technology Machine Learning, Artificial Intelligent, Block Chain, IoT, DevOps, Data Science

Recent Post

Codecademy Code Foundations

Search This Blog

Non-linear SVM and Kernel Function in Machine Learning

Nonlinear SVMs: Feature Space

Nonlinear SVMS: The Kernel Tricks

- With this mapping, our discriminant function is now:
- We only use the dot product of feature vectors in both the training and test.
- A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space:
       k (๐“a, ๐“b) = ฮฆ(๐“a). ฮฆ(๐“b)
Often k (๐“a, ๐“b) may be very inexpensive to compute even ifฮฆ(๐“a) may be extremely high dimensional.

Kernel Example

2-dimensional vector x = [x1x2]
let  K(๐“i, ๐“j) = (1+๐“i.๐“j)2
We need to show that K(๐“i, ๐“j) = ฮฆ(๐“i). ฮฆ(๐“j) 

Commonly-used kernel functions
  •  Linear kernel: K(xi.xj) = xi.xj
  • Polynomial of power p: K(xi,xj) = (1+xi.xj)p
  • Gaussian (radial-basis function):
  •  Sigmoid: K(xi,xj) = tanh(ฮฒ0xi.xj +ฮฒ1)
In general, function that satisfy Mercer's condition can be kernel functions.

Kernel Functions
  • Kernel function can be thought of as a similarity measure between the input objects
  • Not all similarity measure can be used as kernel function.
  • Mercer's condition state that any positive semi-definite kernel K(x,y), i.e.
             ฮฃ K(xi,xj)cicj ≥0
  • Can be expressed as a dot product in a high dimensional space.
SVM Examples

Nonlinear SVM: Optimization

  • Support Vector Machines work very well in practice.
        - The user must choose the kernel function and its parameters
  • They can be expensive in time and space for big datasets
        - The computation of the maximum-margin hyper-plane depends on the square of the number of training cases.
        - We need to store all the support vectors.
  • The kernel trick can also be used to do PCA in a much higher-dimensional space, thus giving a non-linear version of PCA in the original space.
Multi-class classification
  • SVMs can only handle two-class outputs
  • Learn N SVMs
         - SVM 1 learns Class1  vs REST
         - SVM 2 learns Class2  vs REST
         - .
        - SVM n learns Class N vs REST
  • Then to predict the output for a new input, just predict with each SVM and find out which one puts the prediction the furthest into positive region.

No comments:

Post a Comment

Popular Articles