Adaptive sampling for large scale boosting

Classical boosting algorithms, such as AdaBoost, build a strong classifier without concern for the computational cost. Some applications, in particular in computer vision, may involve millions of training examples and very large feature spaces. In such contexts, the training time of off-the-shelf boosting algorithms may become prohibitive. Several methods exist to accelerate training, typically either by sampling the features or the examples used to train the weak learners. Even if some of these methods provide a guaranteed speed improvement, they offer no insurance of being more efficient than any other, given the same amount of time. The contributions of this paper are twofold: (1) a strategy to better deal with the increasingly common case where features come from multiple sources (for example, color, shape, texture, etc., in the case of images) and therefore can be partitioned into meaningful subsets; (2) new algorithms which balance at every boosting iteration the number of weak learners and the number of training examples to look at in order to maximize the expected loss reduction. Experiments in image classification and object recognition on four standard computer vision data sets show that the adaptive methods we propose outperform basic sampling and state-of-the-art bandit methods.

[1]  B. Kégl,et al.  Fast boosting using adversarial bandits , 2010, ICML.

[2]  Joseph K. Bradley,et al.  FilterBoost: Regression and Classification on Large Datasets , 2007, NIPS.

[3]  Lluís Màrquez i Villodre,et al.  Boosting Applied toe Word Sense Disambiguation , 2000, ECML.

[4]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[5]  Peter Auer,et al.  Generic object recognition with boosting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[7]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[8]  François Fleuret,et al.  Boosting with Maximum Adaptive Sampling , 2011, NIPS.

[9]  Carsten Lund,et al.  Priority sampling for estimation of arbitrary subset sums , 2007, JACM.

[10]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11]  Balázs Kégl,et al.  Accelerating AdaBoost using UCB , 2009, KDD Cup.

[12]  Jiri Matas,et al.  Weighted Sampling for Large-Scale Boosting , 2008, BMVC.

[13]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[14]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[16]  D. Geman,et al.  Stationary Features and Cat Detection , 2008 .

[17]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Lluís Màrquez i Villodre,et al.  Boosting Applied to Word Sense Disambiguation , 2000, ArXiv.

[19]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[20]  Ting Li,et al.  Comparing machines and humans on a visual categorization test , 2011, Proceedings of the National Academy of Sciences.

[21]  François Fleuret,et al.  Tasting families of features for image classification , 2011, 2011 International Conference on Computer Vision.