Addressing the Overlapping Data Problem in Classification Using the One-vs-One Decomposition Strategy

Learning good-performing classifiers from data with easily separable classes is not usually a difficult task for most of the algorithms. However, problems affecting classifier performance may arise when samples from different classes share similar characteristics or are overlapped, since the boundaries of each class may not be clearly defined. In order to address this problem, the majority of existing works in the literature propose to either adapt well-known algorithms to reduce the negative impact of overlapping or modify the original data by introducing/removing features which decrease the overlapping region. However, these approaches may present some drawbacks: the changes in specific algorithms may not be useful for other methods and modifying the original data can produce variable results depending on data characteristics and the technique used later. An unexplored and interesting research line to deal with the overlapping phenomenon consists of decomposing the problem into several binary subproblems to reduce its complexity, diminishing the negative effects of overlapping. Based on this novel idea in the field of overlapping data, this paper proposes the usage of the One-vs-One (OVO) strategy to alleviate the presence of overlapping, without modifying existing algorithms or data conformations as suggested by previous works. To test the suitability of the OVO approach with overlapping data, and due to the lack of proposals in the specialized literature, this research also introduces a novel scheme to artificially induce overlapping in real-world datasets, which enables us to simulate different types and levels of overlapping among the classes. The results obtained show that the methods using the OVO achieve better performances when considering data with overlapped classes than those dealing with all classes at the same time.

[1]  Rui Liu,et al.  Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification , 2019, Inf. Sci..

[2]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[3]  Francisco Herrera,et al.  SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary , 2018, J. Artif. Intell. Res..

[4]  Yunchuan Sun,et al.  Spectral–Spatial HyperspectralImage Classification With K-Nearest Neighbor and Guided Filter , 2018, IEEE Access.

[5]  Jiye Liang,et al.  A multi-view OVA model based on decision tree for multi-classification tasks , 2017, Knowl. Based Syst..

[6]  Gee Wah Ng,et al.  Classification for overlapping classes using optimized overlapping region detection and soft decision , 2010, 2010 13th International Conference on Information Fusion.

[7]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[8]  Cheng-Lin Liu Partial discriminative training for classification of overlapping classes in document analysis , 2008, International Journal of Document Analysis and Recognition (IJDAR).

[9]  Mohamed Abdelrazek,et al.  An Ensemble Oversampling Model for Class Imbalance Problem in Software Defect Prediction , 2018, IEEE Access.

[10]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[11]  Gee Wah Ng,et al.  Managing Category Proliferation in Fuzzy ARTMAP Caused by Overlapping Classes , 2009, IEEE Transactions on Neural Networks.

[12]  Longsheng Cheng,et al.  CLASSIFICATION OF CLASS OVERLAPPING DATASETS BY KERNEL-MTS METHOD , 2017 .

[13]  Haitao Xiong,et al.  Classification Algorithm based on NB for Class Overlapping Problem , 2013 .

[14]  Chao Zhang,et al.  Binary Output Layer of Feedforward Neural Networks for Solving Multi-Class Classification Problems , 2019, IEEE Access.

[15]  Khaled Elleithy,et al.  Android Malware Permission-Based Multi-Class Classification Using Extremely Randomized Trees , 2018, IEEE Access.

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Sotiris B. Kotsiantis,et al.  Random Resampling in the One-Versus-All Strategy for Handling Multi-class Problems , 2017, EANN.

[18]  Francisco Herrera,et al.  Analyzing the presence of noise in multi-class problems: alleviating its influence with the One-vs-One decomposition , 2012, Knowledge and Information Systems.

[19]  Luc Devroye,et al.  Lectures on the Nearest Neighbor Method , 2015 .

[20]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[21]  Jian Chu,et al.  A novel SVM modeling approach for highly imbalanced and overlapping classification , 2011, Intell. Data Anal..

[22]  José Salvador Sánchez,et al.  An Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets , 2007, CIARP.

[23]  Jacek Tabor,et al.  Two ellipsoid Support Vector Machines , 2014, Expert Syst. Appl..

[24]  Francisco Herrera,et al.  An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes , 2011, Pattern Recognit..

[25]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  A review on the combination of binary classifiers in multiclass problems , 2008, Artificial Intelligence Review.

[26]  Francisco Herrera,et al.  Evaluating the classifier behavior with noisy data considering performance and robustness: The Equalized Loss of Accuracy measure , 2016, Neurocomputing.

[27]  Daqi Gao,et al.  Classification for Imbalanced and Overlapping Classes Using Outlier Detection and Sampling Techniques , 2013 .

[28]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.

[29]  Gustavo E. A. P. A. Batista,et al.  Balancing Strategies and Class Overlapping , 2005, IDA.

[30]  Kaddour Sadouni,et al.  Binary tree multi-class SVM based on OVA approach and variable neighbourhood search algorithm , 2017, Int. J. Comput. Appl. Technol..

[31]  Yanping Zhang,et al.  A Parameter-Free Cleaning Method for SMOTE in Imbalanced Classification , 2019, IEEE Access.

[32]  Eyke Hüllermeier,et al.  Combining predictions in pairwise classification: An optimal adaptive voting strategy and its relation to weighted voting , 2010, Pattern Recognit..

[33]  Francisco Herrera,et al.  Exploring the effectiveness of dynamic ensemble selection in the one-versus-one scheme , 2017, Knowl. Based Syst..

[34]  Ligang Zhou,et al.  One versus one multi-class classification fusion using optimizing decision directed acyclic graph for predicting listing status of companies , 2017, Inf. Fusion.

[35]  Nicolaos B. Karayiannis,et al.  Handling class overlap with variance-controlled neural networks , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[36]  Fang Wu,et al.  Step-wise support vector machines for classification of overlapping samples , 2015, Neurocomputing.

[37]  Jerzy Stefanowski,et al.  Dealing with Data Difficulty Factors While Learning from Imbalanced Data , 2016, Challenges in Computational Statistics and Data Mining.

[38]  Chidchanok Lursinsap,et al.  Improving classification rate constrained to imbalanced data between overlapped and non-overlapped regions by hybrid algorithms , 2015, Neurocomputing.

[39]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[40]  Xiaofeng Zhu,et al.  Efficient kNN Classification With Different Numbers of Nearest Neighbors , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[41]  José Martínez Sotoca,et al.  When Overlapping Unexpectedly Alters the Class Imbalance Effects , 2007, IbPRIA.

[42]  Emilio Corchado,et al.  A survey of multiple classifier systems as hybrid systems , 2014, Inf. Fusion.

[43]  Rahul Khanna,et al.  Support Vector Machines for Classification , 2015 .

[44]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[45]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..