**K-Nearest Neighbors**is one of the most basic yet essential classification algorithms in Machine Learning.

It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining and intrusion detection.

**Basic k-nearest neighbor classification**

Training method :

- Save the training examples

At prediction time :

- Find the k training examples (X1,y1),.....(xk,yk) that are closest to the test example x

- Classification : Predict the most frequent class among those y1's.

- Regression : Predict the average of among the yi's.

Improvements :

- Weighting examples from the neighborhood

- Measuring "closeness"

- Finding "close" examples in a large training set quickly

**Average of k points more reliable when :**

- nose in attributes

- noise in class labels

**Equal weight to all attributes :**

- Only of the scale of the attributes are similar and differences

- Scale attribute to equal range and equal various

- Classes are spherical

In above image there are three classes red, green, black.

Small K :- Capture fine structure of problem space better, Its may be necessary for small training set

Large K :- The less sensitive to noise (particularly class noise). we get better probability estimate for discrete classes.Larger training training set allows use of larger k.

**Weighted Euclidean Distance**

large weights => attribute is more important

small weights => attribute is less important

zero weights => attribute does't matter

- Weights allow KNN to be effective with axis-parallel elliptical classes

- Where do weights come from ?

**Distance-Weight KNN**

tradeoff between small and large k can be difficult

- use large k, but more emphasis on nearer neighbor ?

**Locally Weighted Averaging**

Let k = number of training points

Let weight fall-off rapidly with distance

KernelWidth control size of neighborhood that has large effect on value (analogous to k)

## No comments:

## Post a Comment