Hi all, in this post we discuss on what 'K- Nearest Neighbors Algorithm' is all about. If you ever come across a classification problem, this solution (K- Nearest Neighbors Algorithm) might be the most simplest of all the classification algorithms you could possibly apply.

Implementation of Algorithm in python is available here.

The idea behind KNN is very intuitive. Assume we have 3 classes to classify (squares, circles, triangles) and we need to classify the test data to one of these classes. So what we do is:

Cons:

Additional info:

credit: Wikipedia

- Assume size of k=3.
- Select a point from test data (green circle in above image)
- Find 3 nearest neighbors.
- Assign the test data to the class which occurs the most in the 3 nearest neighbors (here it is 2 red triangles).

Now, let us look the above described method in algorithm form.

In order for the classification to occur:

- we assume, availability of training data.
- size of k.

Algorithm:

Step1: initialize training data (load them in your program).

Step2: initialize value of k

Step3: for each training data item:

for every training data item:

calculate distance

arrange distances in ascending order and put them in a list.

select first k distances in sorted list.

assign the class which occurs the most in the sorted list.

Pros:

- Intuitive.
- Simple to implement.
- Simple to understand.

- Computationally complex.

- courses.cs.tamu.edu/rgutier/cs790_w02/l8.pdf
- http://en.wikipedia.org/wiki/K-nearest_neighbor_algorithm

