Simultaneous two-sample learning to address binary class imbalance problem in low-resource scenarios

Binary class imbalance problem refers to the scenario where the number of training samples in one class is much lower compared with the number of samples in the other class. This imbalance hinders the applicability of conventional machine learning algorithms to classify accurately. Moreover, many real world training datasets often fall in the category where data is not only imbalanced but also low-resourced. In this paper we introduce a novel technique to handle the class imbalance problem, even in low-resource scenarios. In our approach, instead of, as is common, learning using one sample at a time, two samples are simultaneously considered to train the classifier. The simultaneous two-sample learning seems to help the classifier learn both intra- and inter-class properties. Experiments conducted on a large number of benchmarked datasets demonstrate the enhanced performance of our technique over the existing state of the art techniques.

[1]  Dale Schuurmans,et al.  Learning Coordination Classifiers , 2005, IJCAI.

[2]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[3]  Michael R. Lyu,et al.  Kernelized Online Imbalanced Learning with Fixed Budgets , 2015, AAAI.

[4]  Siti Mariyam Shamsuddin,et al.  Classification with class imbalance problem: A review , 2015, SOCO 2015.

[5]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[6]  Alfredo Petrosino,et al.  Adjusted F-measure and kernel scaling for imbalanced data learning , 2014, Inf. Sci..

[7]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[8]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[9]  Szymon Wilk,et al.  Integrating Selective Pre-processing of Imbalanced Data with Ivotes Ensemble , 2010, RSCTC.

[10]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[11]  María José del Jesús,et al.  A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets , 2008, Fuzzy Sets Syst..

[12]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[13]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[15]  Jose Miguel Puerta,et al.  Improving the performance of Naive Bayes multinomial in e-mail foldering by introducing distribution-based balance of datasets , 2011, Expert Syst. Appl..

[16]  Antônio de Pádua Braga,et al.  Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[18]  Yi-Hung Liu,et al.  Total margin based adaptive fuzzy support vector machines for multiview face recognition , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[19]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[20]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[21]  Sunil Kumar Kopparapu,et al.  A Novel Approach for Effective Learning in Low Resourced Scenarios , 2017, ArXiv.

[22]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[23]  Paul M. Thompson,et al.  Analysis of sampling techniques for imbalanced data: An n=648 ADNI study , 2014, NeuroImage.

[24]  Shiguang Shan,et al.  Multiset Feature Learning for Highly Imbalanced Data Classification , 2017, AAAI.

[25]  Sunil Kumar Kopparapu,et al.  A Novel Data Representation for Effective Learning in Class Imbalanced Scenarios , 2018, IJCAI.

[26]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[27]  Kai Ming Ting,et al.  An Instance-weighting Method to Induce Cost-sensitive Trees , 2001 .

[28]  Sunil Kumar Kopparapu Non-Linguistic Analysis of Call Center Conversations , 2014 .

[29]  Francisco Herrera,et al.  EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling , 2013, Pattern Recognit..

[30]  Loris Nanni,et al.  Coupling different methods for overcoming the class imbalance problem , 2015, Neurocomputing.

[31]  Sunil Kumar Kopparapu,et al.  Analyzing Emotion in Spontaneous Speech , 2018, Springer Singapore.