A **machine learning (ML) algorithm** is essentially a process or sets of procedures that helps a model adapt to the data given an objective. An ML algorithm normally specifies the way the data is transformed from input to output and how the model learns the appropriate mapping from input to output.

Thus**ML algorithm = model + learning algorithm.**

The model specifies the mapping function and holds the parameters while the learning algorithm updates the parameters in an effort to help the model satisfy the objective.

**Examples of ML algorithms are:**

- Linear regression which is about predicting a continuous value given the input. Like given the house dimensions, output the price.
- Artificial neural network (ANN) which is an ensemble of processing nodes arranged in a layer-by-layer manner.
- Logistic regression for two all more categories. It is essentially a single layered ANN where by each node in the layer represents a specific category in which the data can be classified.

K means clustering which produces k centroids from the data.

Agglomerative clustering which can produce a variable number of centroids depending on the cut off threshold.

Self organizing map (SOM) which results in a map of neurons such that the neurons that are physically close together also fire on similar stimuli.

This can resemble k means if the nodes are small in number. Support vector machine (SVM) which is about learning a maximum margin hyperplane.

Kernel SVM which extends the linear maximum margin hyperplane to non-linearly separable problems using the kernel trick. And many more.

An objective can be anything like minimizing the squared error between the desired and actual output values or more complex like picking up objects in robotic applications. But essentially complex objectives can be reduced to simpler ones for the sake of simplifying the training process.

The aim of an learning algorithm is to ensure that the model is able to achieve the goal has accurately as possible.

Thus in general in a supervised setting the objective is in the form:

L(x,y;w)=1n∑il(f(xi;w),yi)

where f() = the machine learning model specified by w parameters and l() = loss function which can be the cross-entropy, hinge-loss or squared error loss function.

The ML model f() can be anything but in modern machine learning f() is normally implemented by a deep neural network (DNN) which is itself just a set of nested functions.

For example, an n-layered DNN can be represented mathematically by:

**f(x;w)=fn(fn−1(…f2(f1(x;w1);w2)…;wn−1);wn)**

So we just transform the input first with layer 1 given by f1(), then feed the result to layer 2 given by f2() and so on up to the last layer who's output is the output of our function f().

Training such a model requires some tricks from optimization theory. In fact the idea is that we can reduce the error of the model if we can find the direction in which to update the weight parameters of each layer.

That can be possible if we get the derivatives of the weights with respect to the objective L and update the parameters using a gradient descent update rule given by:

**w←w−Î»∂L∂w**

where Î» = learning rate

The process may get stuck in a local minimum but it works in practice because many local minima traps are equivalent, that is, they yield similar small losses.

## No comments:

## Post a comment