Sampling a Longer Life: Binary versus One-class classification Revisited

When faced with imbalanced domains, practitioners have one of two choices; if the imbalance is manageable, sampling or other corrective measures can be utilized in conjunction with binary classifiers (BCs). Beyond a certain point, however, the imbalance becomes too extreme and one-class classifiers (OCCs) are required. Whilst the literature offers many advances in terms of algorithms and understanding, there remains a need to connect our theoretical advances to the most practical of decisions. Specifically, given a dataset with some level of complexity and imbalance, which classification approach should be applied? In this paper, we establish a relationship between these facets in order to help guide the decision regarding when to apply OCC versus BC. Our results show that sampling provides an edge over OCCs on complex domains. Alternatively, OCCs are a good choice on less complex domains that exhibit unimodal properties. Class overlap, on the other hand, has a more uniform impact across all methods.

[1]  José Ramón Cano,et al.  Analysis of data complexity measures for classification , 2013, Expert Syst. Appl..

[2]  Nathalie Japkowicz,et al.  One-Class versus Binary Classification: Which and When? , 2012, 2012 11th International Conference on Machine Learning and Applications.

[3]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[4]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[5]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.

[6]  Geoff Jones,et al.  Measurement of data complexity for classification problems with unbalanced data , 2014, Stat. Anal. Data Min..

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Carla E. Brodley,et al.  Class Imbalance, Redux , 2011, 2011 IEEE 11th International Conference on Data Mining.

[9]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[12]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[13]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[14]  Nathalie Japkowicz,et al.  Beyond the Boundaries of SMOTE - A Framework for Manifold-Based Synthetically Oversampling , 2016, ECML/PKDD.

[15]  Ming-Syan Chen,et al.  Asymmetric support vector machines: low false-positive learning under the user tolerance , 2008, KDD.

[16]  José Salvador Sánchez,et al.  An Empirical Study of the Behavior of Classifiers on Imbalanced and Overlapped Data Sets , 2007, CIARP.

[17]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[18]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[19]  Hyoungjoo Lee,et al.  The Novelty Detection Approach for Different Degrees of Class Imbalance , 2006, ICONIP.

[20]  Misha Denil,et al.  Overlap versus Imbalance , 2010, Canadian Conference on AI.

[21]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[22]  Shiven Sharma Learning the Sub-Conceptual Layer: A Framework for One-Class Classification , 2016 .