Sunday, December 23, 2012

Review: Mastering OpenCV with Practical Computer Vision Projects

I got hold of this new book titled "Mastering OpenCV with Practical Computer Vision Projects". Its a really good book if you are familiar with OpenCV and want to try out something very interesting and cool stuff on your desktop/tablet/phone. I have been working on OpenCV for quite a while now. This book covers a variety of projects that could be done with one's OpenCV skills which could be really fun to try out in the real-world. Its a great starting point to flex your OpenCV skills. I bet if you are new to OpenCV, you will find this very exciting once you've understood the code that does all these cool stuff covered in the book (number plate recognition, face recognition, augmented reality, etc).

I personally liked the part on Point Cloud Library and OpenCV. I liked the Chapter 9 where they discussed Kinect and OpenCV. This chapter was provided as a digital download (as a link).

However, these projects have to be taken with a grain of salt. One must understand that, Computer Vision is a very active area of research. There is no single API that does everything and works in every single use-case. Its not like web programming, where you write a piece of code and expect it work as it was designed to. Not everything is provided as an API. For example, you cannot expect the number plate recognition system to work with all kinds of number plates. This specific technique may only work for this specific data-set.

The techniques mentioned in this book are limited with the API OpenCV provides. However, these provide a great starting point if you want to do cool stuff with OpenCV. If you are aware of any other Algorithms that are more efficient (other than the default algorithms implemented in OpenCV available as the part of the framework), you will now know how to put them into better use after reading this book.

If you are not familiar with OpenCV, I suggest you to get book titled "OpenCV 2 Computer Vision Application Programming Cookbook" before trying this book. 

Monday, December 17, 2012

Neural Networks in Octave

I wrote a simple Feed Forward Neural Network in Octave. This is a 2 Hidden Layer Feed Forward Neural Network (200 x 100 and 100 x 50) which was trained on MNIST dataset (2400 training, 2400 testing images). Digit images are of size 28 x 28. I have also trained it on images collected from and an Android device (1000 images = 600 training, 200 testing, 200 cross validation set). I have uploaded the source to my git repo.

You can clone the source using the following command:


I got the following results by comparing it with 1 Hidden Layer Feed Forward Neural Network and Logistic Regression.



Checkout the working of 2 hidden layer network in the following video.

Tuesday, August 28, 2012

Ubuntu OpenCV common install dependencies

For the folks who are sick of re-installing packages after fresh install.

The following is a common install for image processing machine running Ubuntu:


You may have to install OpenCV from the instructions from here.

update: There is another simpler way, if you have access to synaptic package manager(in 12.04 there is no default synaptic package manager installed). All you have to do is mark the installed files and generate a script (by going to File menu) to install those packages. Then load the script in a new install and execute that script.

Tuesday, July 31, 2012

Read XML in OpenCV 2.1 Python

I'm sure many of you folks mix Python and C++ for your OpenCV needs. As all the libraries are not available for both APIs for OpenCV.

OpenCV cannot read or write images in CV_32F format. One way to overcome it is to store image as matrices (in XML or YAML).

Here, I'm trying to read a XML file (which was generated using FileStorage class in C++ ) in python.

The Python API is still evolving and is not well defined in terms of naming conventions as that of the C++ API.


Monday, July 30, 2012

Adaboost

Adaboost is a ensemble Supervised learning technique that aims to improve performance of set of weak classifiers.

Adaboost is a simple concept, i.e. make a decision based on decisions outputted by set of weak classifiers. Now it turns out that a set of weak classifiers can form a strong classifier.

I made an implentation of this algorithm in Python (which basically was re-implementation of the  adaboost tutorial which I found to be very useful  and which I later used it in a project of mine).

Assuming

X = {(x1, y 1), (x2, y2), (x3, y3).... (xt, yt)} and each xi is feature vector of length N.

Y is the label for each data point. So Y would be something like {+1, -1, +1 ,......} assuming this is a binary classification problem.

The final output of the algorithm:

The adaboost testing phase:



  1. t from 1 to T are T weak classifiers.
  2. αt is the weight assigned to classifier at t.
  3. ht(x) is the weak classifier.
  4. H(x)  is the sign of f(x). i.e. the classification output of feature vector x.
Apparently, you can also consider ht(x) to be a "weak feature". Though I cannot explain you that because, I haven't encountered such a situation.

The goal of the training will be to find the right αt.

The training algorithm:

In adaboost, at each iteration t (from t = 1 to T) a new classifier is trained. Initially, each datapoint is assigned a weight (1/(number of data points)) and sent to the classifier.

Do the following form t = 1 to T:
  1. The classifier returns the classification result for each datapoint.
  2. Error(t) = sum of weights of mis-classified points.
  3. Weights of correctly classified datapoint remains the same.
    Now, if datapoint is mis-classified the weight is changed to weight of datapoint * alpha(t).
    Where
  4. Dt+1(i) = (Dt(i) * yi ht(xi) ) / (Zt)
  5. Now, all the weights are normalized to sum to 1.
  6. if Error(t) < 1/2 break;
Zt is a normalization factor which turns ΣDt(i) to 1.

It is important to note that the data which we are operating on is the same in each iteration. The things that change are Error(t), Dt and alpha(t) in each iteration. These variable are re-assigned in each iteration which is vital to Adaboost.

So, final output will be a weighted linear sum of weak classifiers which together when used, give performance of strong classifier.

Interesting facts:
  1. It is observed that Adaboost doesn't generally overfit the data (which is a good thing).
  2. It is said that Adaboost doesn't work well with strong classifiers. i.e. doesn't improve much on performance.
  3. Recently, some researchers have used Adaboost with strong classifiers in a clever way and have shown that it performs well on imbalanced datasets (google/bing "AdaboostSVM" ).

The following are the best description of Adaboost that I found on the web.
  1. AdaBoost, Adaptive boosting.
  2. Adaboost for face detection.
  3. Experiments with a new Boosting Algorithm.
  4. AdaBoost by Jiri Matas and Jan Sochman.

Friday, May 25, 2012

Linear Discriminant Analysis using OpenCV

This is a follow up post for my small re-implementation of Linear Discriminant Analysis in OpenCV (C++). I re-implmented Stephen Marshland's python code in C++ for my own purpose. I also took help from Phillip Wagner's code on Fischer Faces while coding this small C++ project.

LDA is a dimensionality reduction technique (and a form of supervise learning technique) which is used for classification.

LDA takes multi-dimensional data, makes use of prior class information (Supervised Learning) and represents the data in a form which maximizes the distance between different classes.

You may wonder "How does it do it"?

It basically takes covariance of a class (of data) with itself, mean of the entire data, mean of each class, prior probabilities of the class. LDA also uses scatter within a class, scatter in between classes and tries to best separate 2 classes of data.

I wrote a small library doing the same in OpenCV as there was no class in C++ to do Linear Discriminant Analysis.

Before doing this with LDA, make sure that you have eigen libraries installed in your system and then install OpenCV (compile OpenCV with enabling use of Eigen Library). This is important as OpenCV relies on Eigen Library to calculate generalized Eigen Values, Vectors.

For dimensionality reduction:
Y = W . X

where W is the weights, Y is the project vector(projected vector in lower dimensions) and X is the original feature vector(in higher dimensions).

NOTE: The dimensionality of Y is always equal to (number of classes - 1).


So, if you have 3 classes (C=3) of data (where each feature vector is of length N) and you want to project it to lower dimensions using LDA, the resultant projected vector will be of length 2 (i.e. C-1 = 3-1 = 2).

To install Eigen Library give the following command:
This makes sure that you have installed Eigen library on your PC. If you are not able to install using the above command, download the deb file from here.

To get a complete overview of LDA, look these links:
  1. FISHER LDA MDA.
  2. Linear Discriminant Analysis: A Brief Tutorial.
Download the following git repo and build it. If you have no idea on how to do it, you can follow the following steps:



Sunday, April 15, 2012

General Eigen Values in OpenCV

Calculating Generalized Eigen Values and Eigen Vectors in one of the very trivial tasks in languages like Matlab, Python(Numpy) in scientific computing. However, I've observed that finding Generalized Eigen Values, Vectors is one of the most common questions asked in regards with OpenCV. When you are doing something like LDA dimensionlaity reduction, you will need general Eigen Values, Vectors. This is useful in many other cases which need general eigen values to solve problems in OpenCV.

Calculating them in OpenCV is not a straight forward task.

Btw, My other programs in OpenCV will be posted here.

Also, calculated Eigen Values, Vectors are not sorted according to the correspoding Eigen Values. I've written down a solution to do this in OpenCV.

My solution uses Eigen3 library. It is a 3rd party library compatible with OpenCV 2.2+ (It does inear algebra). If you do not have Eigen3 Library, you can install on your debian system from the following
link or do execute following command:


Now that Eigen3 library is installed, you can use a cv::Mat(of whose you would want to calculate general Eigen Values and Vectors) and convert it to Matrix in Eigen3(to solve General Eigen Value problem in Eigen3 library) and then convert it back to cv::Mat.

Also the following code sorts the eigen values and vectors according to descending order which is what Matlab, Python(Numpy) do by default (while calculating Eigen Values, Vectors).

Matlab does it the following way:
Numpy-Scipy(Python) does it the following way:


Now doing the same in OpenCV 2.3(C++) using Eigen3 library:
** CODE UPDATED**
If you find any bug in the above code, please let me know.

Wednesday, March 7, 2012

Reading Configuration Files in C++

This is a quick code snippet to read a user defined configuration files in C++. I use boost libraries.

There many ways to read it(depends on how you have your configuration file written). You can also use  boost::property_tree which can efficiently parse formats such as XML, INI, etc.

Make sure you've installed boost before trying this out.

For a text file with data in the following format(data in each line separated by tab spaces and new unit of data in each line):
 



The returned data is in form of vector of vectors. You can replace above configuration file with similar structure (and replace tabs with any common delimiter and it will work).

Saturday, March 3, 2012

Mahalanobis Distance in OpenCV and Python

In this post we discuss about calculating Mahalanobis distance in OpenCV using C++.

Here we calculate Covariance Matrix, Inverse Covariance Matrix and Mahalanobis Distance using newer C++ API and comparing its results in Python (Numpy, Scipy).

Btw, My other programs in OpenCV will be posted here.

In OpenCV, its calculated as:

For covariance:
For inverse matrix:
For Mahalanobis Distance:
In Python you use:

I have been through this post and this post where they do covariance matrix in OpenCV using C++ but follow older API structure. I double checked this implementation with their code and Numpy.

There are lots of articles on the web claiming to get wrong results using the the OpenCV's API to calculate Covariance Matrix, etc. I also found similar errors but with the following code I got them fixed.

There is another variant of this function. But it looks more like C API.
Mahalanobis distance takes into account the co-variance in the variable data while calculating the distance between 2 points. This is a good example of Mahalanobis distance explanation and implementation in Matlab.

First in OpenCV:
The above implementation was checked in Numpy to see if the results were right or wrong (there are many posts in the web asking if the calccovar matrix function is buggy or request to change the calccovarmatrix function API ).

Now in Numpy:
The results match.

Sunday, January 29, 2012

Parameter Selection in Hidden Markov Models

Hidden Markov Models have been used in Computer Vision, Speech Recognition extensively.

In this post, I discuss the criterion for selecting parameters of an Hidden Markov Model (HMM) for optimum training. If you still do not properly understand whats an HMM, please refer this article on HMMs. In order to classify data, one HMM is needed per class.

Assume that we want to have a Discrete HMM represented by λ( A, B, π). O be the set of observations and N to be number of states in a HMM and M be the number of unique symbols.

P(O|λ)  = P = likelihood that an observation matches the given HMM model.


The following are some of the criteria in selecting parameters in Hidden Markov Models (taken from Bell System Journal):
  1. Initial Estimates of A and B: We know that HMM classifier can be trained using Baum-Welch algorithm. For the given set of observations sequences (O1, O2, ......) in Baum-Welch algorithm, λ is re-estimated iteratively such that P is maximized. This depends on the initial parameters of HMM i.e.  λ. Different values of A, B and π gives different likelihoods. Different values of λ give different results. There is no one right way to select initial estimates. Hence, they can be of any random values( however, the matrices A, B and π should sum to 1 along the rows). 

  2. HMM structure and number of states: Structure of an HMM has an effect on performance of the HMM. For example in speech recognition, a HMM which is modeled from left to right is preferred. There also is no need for a state to traverse to each and every state in the model. i.e. Few elements of A can be zero. If N is too large reliable determination of A and B can become hard. If N is large, also Viterbi Decoding becomes hard. There is also no right way to choose N as it need not be related to a "observable phenomenon" in the problem.

  3. Multiple Observation sequences: Multiple observation sequences (from different training examples of same class) need to be used. Mixing training samples from different sources is a good idea.

  4. Constraints on A, B matrices during training: The matrices A, B and π  can take any values (if  they follow the constraints such as along each row of a matrix sum of elements needs to be 1). Its found that a HMM where, a state can traverse to all the states is not so flexible. At the same time, a state transition should also not occur "within" the state or "only" to next state. Such restrictions might constrain the level to which the model can be trained. For example in speech models, the transition is usually from left to right, so the matrix elements are zero below the diagonal. If the matrix B is unconstrained and the data set for training is limited, it is likely that bj(k)=0. There are some solutions for this problem in the journal.

  5. Multiple estimates of A, B and averaging: It is known that different initial values of λ is likely to give different likelihoods for a testing set. So, different values of λ needs to implemented and averaged (average values of matrix elements of AB and π). However, it was observed that results of the averaged model was not better than other best solutions computed. So, picking the model that has low memory constraint and gives good results should be picked.

Friday, January 20, 2012

Install Boost Libraries on Ubuntu

I've heard a lot about Boost Libraries. They are well known in the programming community as one of the most well written, quite handy C++ libraries one can use. You can download them here.

For a quick install of Boost libraries on Ubuntu, use the following command:


To quickly learn about Boost Library, click here for free online accessible book.

Tuesday, January 17, 2012

Kmeans clustering in OpenCV with C++

Kmeans clustering is one of the most widely used UnSupervised Learning Algorithms. If you are not sure what Kmeans is, refer this article. Also if you have heard about the term Vector Quantization, Kmeans is closely related to that (refer this article to know more about it). Autonlab has a great ppt on Kmeans Clustering.

First, I'll talk about the kmeans usage in OpenCV with C++ and then I'll explain it with a program. If you are not yet comfortable in OpenCV with  C++, please refer to this article and the pretty much everything else is the same as in C API (where you use IplImage*,etc).

Btw, My other programs in OpenCV will be posted here.

Function call in C++ API of OpenCV accepts the input in following format:
double kmeans(const Mat& samples, int clusterCount, Mat& labels, TermCriteria termcrit, int attempts, int flags, Mat* centers);

Parameters explained as follows:
  1. samples: It contains the data. Each row represents a Feature Vector. Each co lumn in a row represent a dimension. So, we can have multiple dimensions of data in the feature vector. Example if we have 50, 5 dimensional feature vector, we will have 50 rows, 5 colums of this matrix. One thing interesting which I've noticed is kmeans doesn't work with CV_64F type.
  2. clusterCount: It should be specified beforehand. We need to know how many clusters do we divide the data into. It is an integer.
  3. labels: It is an output Matrix. If we had a Matrix of above specified size (i.e 50 x 5 ), we will have 50 x 1 output Matrix. It determines which cluster the feature vector belongs. It starts with 0, 1, .... (number of clusters-1).
  4. TermCriteria: It determines the criteria in applying the algorithm. Max iterations, accuracy,etc. 
  5. attempts: number of attempts made with different initial labelling. Also refer documentation for elaborate information on this parameter.
  6. flags: It can be
    KMEANS_RANDOM_CENTERS   (for random initialization of cluster centers).
    KMEANS_PP_CENTERS   (for kmeans++ version of initializing cluster centers)
    KMEANS_USE_INITIAL_LABELS   (for user defined initialization).
  7. centers: Matrix holding center of each cluster. If we divide the 50 x 5 feature vector into 2 clusters, we will have 2 centers of each in 5 dimensions.
Sample program is explained as follows:


Friday, January 13, 2012

Access Pixels in C++ using OpenCV

I'm writing this blog post since, I've seen many people while moving from C API to C++ API have faced a lot of confusion in understanding it.

This post explains on how to use cv::Mat datatype in accessing pixels of a Image or elements of a Matrix. An example of cv::Mat_ is also shown.

Btw, My other programs in OpenCV will be posted here.

Many people still write code using IplImage* or CvMat* datatype. The sight of pointers scare a lot of people (except for some). With moving to C++ API, one can make use of C++ references which are more easier to understand and also one can use Object Oriented Programming techniques to bring in more easier to write code and flexibity in programming.

Moving to C++ API hasn't been that tough for me but, it definitely took a while. Now, my code looks more cleaner that before. I recommend everyone to move to C++ API using the cv::Mat type.

Sometimes, when you don't want to deal with the return type of the Matrix while accessing the elements (i.e. without the use of ".at" operator), you can use cv::Mat_ (which is a template sub class of cv::Mat).

Taking care of the data-type of Matrix elements is really important. I didn't care much about it while using the C API. But C++ API makes you more aware of the data-type of elements of the Matrix. This is for good as it encourages efficient usage of memory.

To use C++ API, you need the following headers and namespace.

Fore more faster ways of accessing pixels, refer OpenCV documentation.

The following code is used to:
  1. Declare.
  2. Assign.
  3. Print or Access
 values of pixels in Images or elements in Matrices of cv::Mat type in OpenCV:

Declaring Matrices: You have to know the data-type of elements of the Matrix before accessing or printing them.

For assigning values: For printing values (same as above):

Thursday, January 12, 2012

Installing GHMM

There seems to be some dependency issues while installing GHMM in Linux (Ubuntu). The following script installs GHMM (A HMM lib written in C with a python wrapper and a GUI).

I running Ubuntu 10.04 LTS.