Tuesday, May 24, 2011

KNN classifier Python

I believe you might have read my previous article on KNN classifier. So, this is the next part of that where we are dealing with implementation of it in Python.

My other machine learning articles will be posted here.

The required data set to run this program can be found here: train.txt and test.txt .


DISCLAIMER: I DON'T OWN THE DATASET.

The above data set was DERIVED from the famous Iris Flower dataset .


This program uses Matplotlib and Numpy.

If you have not installed (on Ubuntu OS), install it using the following commands:

sudo apt-get install python-matplotlib
sudo apt-get install python-numpy
sudo apt-get install python-scipy


The program is as follows:



Output:



Plot:



If you observe, you can see that as value of k increases, accuracy increases to some extent and then doesn't increase after that (in fact decreases). 

Please leave a comment if you find this article useful.

Monday, May 23, 2011

KNN classification algorithm

Hi all, in this post we discuss on what 'K- Nearest Neighbors Algorithm' is all about. If you ever come across a classification problem, this solution (K- Nearest Neighbors Algorithm) might be the most simplest of all the classification algorithms you could possibly apply.


Implementation of Algorithm in python is available here.



Consider the following image:


credit: Wikipedia

The idea behind KNN is very intuitive. Assume we have 3 classes to classify (squares, circles, triangles) and we need to classify the test data to one of these classes. So what we do is:

  1. Assume size of k=3.
  2. Select a point from test data (green circle in above image)
  3. Find 3 nearest neighbors.
  4. Assign the test data to the class which occurs the most in the 3 nearest neighbors (here it is 2 red triangles).


Now, let us look the above described method in algorithm form.

In order for the classification to occur:
  • we assume, availability of training data.
  • size of k.
Algorithm:

Step1: initialize training data (load them in your program).
Step2: initialize value of k
Step3: for each training data item:
                 for every training data item:
                      calculate distance
                 arrange distances in ascending order and put them in a list.
                 select first k distances in sorted list.
                 assign the class which occurs the most in the sorted list.

Pros:
  • Intuitive.
  • Simple to implement.
  • Simple to understand.

Cons:

  • Computationally complex.

Additional info:

Friday, May 20, 2011

Memory Management for OpenCV applications

Memory management in OpenCV is an important task because:
  1. Most of OpenCV coding is done in C,C++ where programmers have to themselves deal with pointers, references (not have dealt with python programming in OpenCV). 
  2. No garbage collection in these languages (destructors in C++ claim to do it, but not sure how efficient it is).
I'm talking about managing memory, where one would have to deal with 1000's of images (video stream, stream of images over the network, etc) in real time.

Update: I would suggest moving to C++ API or Python API of OpenCV instead of C. They have better Memory Management internally.

Btw, My other programs in OpenCV will be posted here.


Things NEVER to do in OpenCV:
  • DECLARE structures, variables in a loop


    instead it should be (assuming function will return the right image):



  • use "create" functions in a loop. (cvCreateImage, cvCreateMat, etc).


    instead, it should be something as:




Things to do:

  • recycling: reuse already allocation memory ( img_src = find_a_face(img_src) )
  • pass by reference instead of pass by value.
  • release images, mats, storage, etc after they are done with ( cvReleaseImage(&img_src) ).
  • try to identify variable that remain constant through out the loop, declare it separately (may be from the constructor).
  • make use of static variable wherever necessary.

Also cool website has some stuff on OpenCV Memory management: http://www.aishack.in/2010/01/opencv-memory-management/

Wednesday, May 18, 2011

Creating a Local Ubuntu Mirror using apt-mirror

The pre-requisites:
  1. Have a running Ubuntu installation.
Installation:
  • Type the command below in the terminal:
    sudo apt-get install apt-mirror
  • or install the package "apt-mirror" from Synaptic if you prefer a GUI or you can alternatively use aptitude from the command line.
Configuration:
  • Edit the configuration file /etc/apt/mirror.list file. You might find the lines below by default:

    ############# config ##################
    #
    # set base_path    /var/spool/apt-mirror
    #
    # set mirror_path  $base_path/mirror
    # set skel_path    $base_path/skel
    # set var_path     $base_path/var
    # set cleanscript $var_path/clean.sh
    # set defaultarch  
    # set postmirror_script $var_path/postmirror.sh
    # set run_postmirror 0
    set nthreads     20
    set _tilde 0
    #
    ############# end config ##############
    
    deb http://archive.ubuntu.com/ubuntu maverick main restricted universe multiverse
    deb http://archive.ubuntu.com/ubuntu maverick-security main restricted universe multiverse
    deb http://archive.ubuntu.com/ubuntu maverick-updates main restricted universe multiverse
    #deb http://archive.ubuntu.com/ubuntu maverick-proposed main restricted universe multiverse
    #deb http://archive.ubuntu.com/ubuntu maverick-backports main restricted universe multiverse
    
    deb-src http://archive.ubuntu.com/ubuntu maverick main restricted universe multiverse
    deb-src http://archive.ubuntu.com/ubuntu maverick-security main restricted universe multiverse
    deb-src http://archive.ubuntu.com/ubuntu maverick-updates main restricted universe multiverse
    #deb-src http://archive.ubuntu.com/ubuntu maverick-proposed main restricted universe multiverse
    #deb-src http://archive.ubuntu.com/ubuntu maverick-backports main restricted universe multiverse
    
    clean http://archive.ubuntu.com/ubuntu
    
    
  • This may be rewritten as:

    ############# config ##################
    #
    set base_path    /home/myusername/apt-mirror
    set mirror_path  $base_path/mirror
    set skel_path    $base_path/skel
    set var_path     $base_path/var
    set cleanscript $var_path/clean.sh
    set defaultarch  i386
    set postmirror_script $var_path/postmirror.sh
    set run_postmirror 0
    set nthreads     20
    set _tilde 0
    #set limit_rate 2.5k
    #
    ############# end config ##############
    
    
    deb http://archive.ubuntu.com/ubuntu maverick main restricted universe multiverse
    deb http://archive.ubuntu.com/ubuntu maverick-security main restricted universe multiverse
    deb http://archive.ubuntu.com/ubuntu maverick-updates main restricted universe multiverse
    deb http://archive.canonical.com/ubuntu maverick partner
    deb http://extras.ubuntu.com/ubuntu maverick main
    
    
    clean http://archive.ubuntu.com/ubuntu
    clean http://archive.canonical.com
    clean http://extras.ubuntu.com
    

What I have done in this configuration:
  • base_path: The place where are the downloaded packages go.
  • mirror_path,skel_path,var_path,cleanscript,postmirror_script,run_postmirror,_tilde: Better leave them as they are.
  • defaultarch: When you specify "deb <deb-url>" it'll fetch the URL of this type, or you can explicitly specify which architecture packages to pick up by specifying it as "deb-i386 <deb-url>"
  • nthreads: The number of threads of wget that have to be launched for fetching the mirror contents
  • limit_rate: This variable can be set to limit the download rate of the wget but this is per thread. So it boils down to (20 Threads)*2.5KBps = 50 KBps (Notice the 'B' it's for bytes).
  • I just picked up the deb URL's from my file "/etc/apt/sources.list".
  • I skipped the "deb-src" url's because I wouldn't be needing the sources and it'll just eat up my space. It all depends on the need of  the mirror if you need 'em just include them in the package.
  • deb-urls: These are of the form "deb http://archive.ubuntu.com/ubuntu maverick-updates main restricted universe multiverse". Replace maverick with the codename that you want to mirror, maverick is the codename for Ubuntu 10.10. You can find out which version you are using by the command "lsb_release -a". If you just need the codename use "lsb_release -c". You can mirror any number of archives.
Start Mirroring:
  • I would recommend mirroring the archive as a normal user than as a root user using "sudo"
  • Run the command "apt-mirror" after you're done with the configuration, wait for it to complete
  • If you'd like to detach the command from the terminal you can issue the command "apt-mirror &"
  • Warning: Set the base_path variable only to the folder to which the current user or the user using whom you are trying to run "apt-mirror" has write permissions
Settting it for daily update using a cron job:
  • Edit the file "/etc/cron.d/apt-mirror". This file would be available at your disposal after successful installation of the "apt-mirror" package
  • You'll find some default commented lines you can change it and uncomment or you can insert "@daily myusername /usr/bin/apt-mirror > /home/myusername/apt-mirror/cron.log && /home/myusername/apt-mirror/var/clean.sh"
  • Adding the above command will schedule a cron job in your operating system such that it'll start that command everyday at midnight which in-turn does the downloading for you and clean up older(if a newer deb package comes)/unwanted(if you change your configuration file after  download and re-run the script) packages.
  • Warning: Try the command manually before adding it to the cron job and check whether the download process is starting successfully and it is having write permissions. This is just a fail-safe
The mirroring is complete. What to do now?
  • Install Apache http server using the command "sudo apt-get install apache2".
  • Now all you have to do is create a symbolic link in the /var/www folder which is the root of apache2 web server in Ubuntu:
    • I'll create a new folder in /var/www with name ubuntu by using the command "sudo mkdir /var/www/ubuntu".
    • Now browse to that folder using "cd /var/www/ubuntu" and create symbolic links to mirrors by issuing the commands below:
      ln -s /home/myusername/apt-mirror/mirror/archive.canonical.com/ archive-canonical
      ln -s /home/myusername/apt-mirror/mirror/archive.ubuntu.com/ archive-ubuntu
      ln -s /home/myusername/apt-mirror/mirror/extras.ubuntu.com/ extras-ubuntu
      
    • I've used the names archive-canonical, archive-ubuntu, extras-ubuntu only for my reference you are free to use whichever names you desire.
    • The folders under apt-mirror/mirror such as archive.canonical.com and archive.ubuntu.com etc are there because you have specified them in your configuration file i.e., mirror.list file. Suppose you specify a deb url as "deb http://extras.ubuntu.com/ubuntu maverick main" the script will create a folder extras.ubuntu.com under the apt-mirror/mirror directory in our case or under the path where you've specified the "mirror_path" variable.
  • Well ah, that is all for the configuration.
The final step using the mirror to update the packages:
  • Pre-requisite: Make sure that your port 80 is open.
  • Edit the /etc/apt/sources.list file in the host machine from (use the below only as an example).

    deb http://archive.canoncial.com/ubuntu maverick partner
    deb http://extras.ubuntu.com/ubuntu maverick main
    deb http://security.ubuntu.com/ubuntu/ maverick-security main restricted
    deb http://security.ubuntu.com/ubuntu/ maverick-security universe
    deb http://security.ubuntu.com/ubuntu/ maverick-security multiverse
    
  • To the following:
    deb http://localhost/ubuntu/archive-canonical/ubuntu maverick partner
    deb http://localhost/ubuntu/extras-ubuntu/ubuntu maverick main
    deb http://localhost/ubuntu/security-ubuntu/ubuntu/ maverick-security main restricted
    deb http://localhost/ubuntu/security-ubuntu/ubuntu/ maverick-security universe
    deb http://localhost/ubuntu/security-ubuntu/ubuntu/ maverick-security multiverse
    
  • Replace localhost with an IP address if on a network.
  • Run the commands:
    sudo apt-get update
    sudo apt-get upgrade
    
  • Well that's should do it your system is having the latest updates(if you've mirrored it recently) installed.

ssh Host key verification failed problem in Linux

I believe sometimes you might encounter the problem of Host key verification while doing an ssh to one your terminals.

You may see something like this:



Then, the most simple solution would be to, just open /home/your_username/.ssh/known_hosts file and delete the contents. Also you might want to re-establish all the ssh connections.

Then the smooth flow of actions would look something like this:


Sunday, May 15, 2011

Setup ieee 1394 camera with OpenCV support

Hi folks,
I'm sure that some of you might have tried to interface a IEEE 1394 firewire camera with Linux but found no success.


This is the way I could use Imaging Source IEEE 1394 camera with Ubuntu 10.4 OS using libdc1394 / Coriander.



Follow the steps:
  1. Plug your camera to the 6 pin IEEE 1394 port on your PC. (Obviously ;) )
  2. Install Libraw1394, Libdc1394 libraries and corriander. 

  3. Open a text file and save the following as bash script:



  4. You will most of the times need to execute the above script before running any program (especially if, something may have gone wrong in your program in runtime or if you have any segmentation fault after running the program, whatsoever). This script makes sure that everything is right for the your libdc1394 program to execute. (you don't have to chmod every time, so its basically, the first 2 lines of code that you want to use).
  5. Write your program in C++/C specification of libdc1394 and execute it.
  6. You can also make sure that your camera is working using coriander (just go to command line 
  7. terminal and execute the command $ coriander to open up coriander).
**Update**

Code to convert dc1394video_frame_t to IplImage:

(thanks to Rohith Reddy for the following code):



Most of the documentation available on the web for very old versions of libdc1394. For latest documentation and working example programs visit

Old resources/documentation of libdc1394 is available on the following links:

Cheers,
Rahul Kavi.