Monday, May 23, 2011

KNN classification algorithm

Hi all, in this post we discuss on what 'K- Nearest Neighbors Algorithm' is all about. If you ever come across a classification problem, this solution (K- Nearest Neighbors Algorithm) might be the most simplest of all the classification algorithms you could possibly apply.


Implementation of Algorithm in python is available here.



Consider the following image:


credit: Wikipedia

The idea behind KNN is very intuitive. Assume we have 3 classes to classify (squares, circles, triangles) and we need to classify the test data to one of these classes. So what we do is:

  1. Assume size of k=3.
  2. Select a point from test data (green circle in above image)
  3. Find 3 nearest neighbors.
  4. Assign the test data to the class which occurs the most in the 3 nearest neighbors (here it is 2 red triangles).


Now, let us look the above described method in algorithm form.

In order for the classification to occur:
  • we assume, availability of training data.
  • size of k.
Algorithm:

Step1: initialize training data (load them in your program).
Step2: initialize value of k
Step3: for each training data item:
                 for every training data item:
                      calculate distance
                 arrange distances in ascending order and put them in a list.
                 select first k distances in sorted list.
                 assign the class which occurs the most in the sorted list.

Pros:
  • Intuitive.
  • Simple to implement.
  • Simple to understand.

Cons:

  • Computationally complex.

Additional info:

No comments:

Post a Comment