Learning object from small and imbalanced dataset with Boost-BFKO

One of the main drawbacks of boosting is its overfitting and poor predictive accuracy when the training dataset is small and imbalanced. In this paper, we introduce a novel learning algorithm Boost-BFKO, which combines boosting and data generation. It is suitable for small and imbalanced training datasets. To enlarge training sets, Boost-BFKO uses the adaptive Balanced Feature Knockout procedure (BFKO) to generate new synthetic samples. To enrich the training sets, Boost-BFKO selects seed samples from the minority class, and rebalances the total weights of the different classes in the updated training dataset. Experiments on Caltech 101 database showed that our method achieves a desirable performance when only a few training samples are available for binary classification and multiple object classification.

[1]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[4]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Wen Gao,et al.  Enhancing Human Face Detection by Resampling Examples Through Manifolds , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[6]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[7]  Lior Wolf,et al.  Robust boosting for learning from few examples , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Alexander Vezhnevets,et al.  Avoiding Boosting Overfitting by Removing Confusing Samples , 2007, ECML.

[10]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[11]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.