## Monday, July 30, 2012

Adaboost is a ensemble Supervised learning technique that aims to improve performance of set of weak classifiers.

Adaboost is a simple concept, i.e. make a decision based on decisions outputted by set of weak classifiers. Now it turns out that a set of weak classifiers can form a strong classifier.

I made an implentation of this algorithm in Python (which basically was re-implementation of the  adaboost tutorial which I found to be very useful  and which I later used it in a project of mine).

Assuming

X = {(x1, y 1), (x2, y2), (x3, y3).... (xt, yt)} and each xi is feature vector of length N.

Y is the label for each data point. So Y would be something like {+1, -1, +1 ,......} assuming this is a binary classification problem.

The final output of the algorithm:

1. t from 1 to T are T weak classifiers.
2. αt is the weight assigned to classifier at t.
3. ht(x) is the weak classifier.
4. H(x)  is the sign of f(x). i.e. the classification output of feature vector x.
Apparently, you can also consider ht(x) to be a "weak feature". Though I cannot explain you that because, I haven't encountered such a situation.

The goal of the training will be to find the right αt.

The training algorithm:

In adaboost, at each iteration t (from t = 1 to T) a new classifier is trained. Initially, each datapoint is assigned a weight (1/(number of data points)) and sent to the classifier.

Do the following form t = 1 to T:
1. The classifier returns the classification result for each datapoint.
2. Error(t) = sum of weights of mis-classified points.
3. Weights of correctly classified datapoint remains the same.
Now, if datapoint is mis-classified the weight is changed to weight of datapoint * alpha(t).
Where
4. Dt+1(i) = (Dt(i) * yi ht(xi) ) / (Zt)
5. Now, all the weights are normalized to sum to 1.
6. if Error(t) < 1/2 break;
Zt is a normalization factor which turns ΣDt(i) to 1.

It is important to note that the data which we are operating on is the same in each iteration. The things that change are Error(t), Dt and alpha(t) in each iteration. These variable are re-assigned in each iteration which is vital to Adaboost.

So, final output will be a weighted linear sum of weak classifiers which together when used, give performance of strong classifier.

Interesting facts:
1. It is observed that Adaboost doesn't generally overfit the data (which is a good thing).
2. It is said that Adaboost doesn't work well with strong classifiers. i.e. doesn't improve much on performance.
3. Recently, some researchers have used Adaboost with strong classifiers in a clever way and have shown that it performs well on imbalanced datasets (google/bing "AdaboostSVM" ).

The following are the best description of Adaboost that I found on the web.