Boosted Near-miss Under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets

Considering the challenges of using SVM to learn concepts from large-scale imbalanced datasets, we proposed a new method: Boosted Near-miss Under-sampling on SVM ensembles (BNU-SVMs). The BNU-SVMs is under the framework of under-sampling ensemble method, where a sequence of SVMs is trained and the training dataset for each base SVM is selected by a Boosted Near-miss Under-sampling technique. More specifically, by adaptively updating weights over negative examples, the most near-miss negative examples in output space are selected in each iteration. Since the training dataset is balanced and reduced by under-sampling and the performance of classifier is improved by ensembles, the BNU-SVMs is a promising solution for large-scale and imbalance problem. Moreover, the negative examples selected by BNU-SVMs not only contain the most representative ones from data distribution perspective, but also cover the easily misclassified ones from data accuracy perspective. Therefore, the outperformance of the BNU-SVMs is expected. In addition, considering the computation cost caused by high-dimensional visual features, we proposed a kernel-distance pre-computation technique to further improve the efficiency of the BNU-SVMs. Experiments on TRECVID benchmark datasets show that the BNU-SVMs outperforms the previous methods significantly, which demonstrates that the BNU-SVMs is a both effective and efficient solution to concept detection in large-scale imbalanced datasets.

[1]  Yuxin Peng,et al.  AdaOUBoost: adaptive over-sampling and under-sampling to boost the concept learning in large scale imbalanced data sets , 2010, MIR '10.

[2]  Bo Zhang,et al.  Learning concepts from large scale imbalanced data sets using support cluster machines , 2006, MM '06.

[3]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[6]  Yi Yang,et al.  Interactive Video Indexing With Statistical Active Learning , 2012, IEEE Transactions on Multimedia.

[7]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[8]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[9]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[11]  Yue-Shi Lee,et al.  Cluster-based under-sampling approaches for imbalanced data distributions , 2009, Expert Syst. Appl..

[12]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[13]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  J. Friedman,et al.  On bagging and nonlinear estimation , 2007 .

[15]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[18]  Marcel Worring,et al.  Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[19]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[20]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[21]  Stéphane Ayache,et al.  Video Corpus Annotation Using Active Learning , 2008, ECIR.

[22]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[23]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[24]  Yves Lecourtier,et al.  Controlling the diversity in classifier ensembles through a measure of agreement , 2005, Pattern Recognit..

[25]  Xiangji Huang,et al.  Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles , 2006, PAKDD.

[26]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[27]  Paul Over,et al.  TRECVID 2008 - Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2010, TRECVID.

[28]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[30]  Edward Y. Chang,et al.  Adaptive Feature-Space Conformal Transformation for Imbalanced-Data Learning , 2003, ICML.

[31]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[32]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[33]  Xiaowei Yang,et al.  Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning , 2009, ADMA.

[34]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[35]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[36]  Sungzoon Cho,et al.  EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems , 2006, ICONIP.

[37]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.