Adaboost is a simple concept, i.e. make a decision based on decisions outputted by set of weak classifiers. Now it turns out that a set of weak classifiers can form a strong classifier.
I made an implentation of this algorithm in Python (which basically was re-implementation of the adaboost tutorial which I found to be very useful and which I later used it in a project of mine).
Assuming
X = {(x_{1}, y _{1}), (x_{2}, y_{2}), (x_{3}, y_{3}).... (x_{t}, y_{t})} and each x_{i} is feature vector of length N.
Y is the label for each data point. So Y would be something like {+1, -1, +1 ,......} assuming this is a binary classification problem.
The final output of the algorithm:
The adaboost testing phase:
- t from 1 to T are T weak classifiers.
- α_{t} is the weight assigned to classifier at t.
- h_{t}(x) is the weak classifier.
- H(x) is the sign of f(x). i.e. the classification output of feature vector x.
The goal of the training will be to find the right α_{t}.
The training algorithm:
In adaboost, at each iteration t (from t = 1 to T) a new classifier is trained. Initially, each datapoint is assigned a weight (1/(number of data points)) and sent to the classifier.
Do the following form t = 1 to T:
- The classifier returns the classification result for each datapoint.
- Error(t) = sum of weights of mis-classified points.
- Weights of correctly classified datapoint remains the same.
Now, if datapoint is mis-classified the weight is changed to weight of datapoint * alpha(t).
Where
- D_{t+1}(i) = (D_{t}(i) * y_{i} h_{t}(x_{i}) ) / (Z_{t})
- Now, all the weights are normalized to sum to 1.
- if Error(t) < 1/2 break;
Z_{t} is a normalization factor which turns ΣD_{t}(i) to 1.
It is important to note that the data which we are operating on is the same in each iteration. The things that change are Error(t), D_{t} and alpha(t) in each iteration. These variable are re-assigned in each iteration which is vital to Adaboost.
So, final output will be a weighted linear sum of weak classifiers which together when used, give performance of strong classifier.
Interesting facts:
The following are the best description of Adaboost that I found on the web.
It is important to note that the data which we are operating on is the same in each iteration. The things that change are Error(t), D_{t} and alpha(t) in each iteration. These variable are re-assigned in each iteration which is vital to Adaboost.
So, final output will be a weighted linear sum of weak classifiers which together when used, give performance of strong classifier.
Interesting facts:
- It is observed that Adaboost doesn't generally overfit the data (which is a good thing).
- It is said that Adaboost doesn't work well with strong classifiers. i.e. doesn't improve much on performance.
- Recently, some researchers have used Adaboost with strong classifiers in a clever way and have shown that it performs well on imbalanced datasets (google/bing "AdaboostSVM" ).
The following are the best description of Adaboost that I found on the web.
No comments:
Post a Comment