Learning over subconcepts: Strategies for 1‐class classification

In machine learning research and application, multiclass classification algorithms reign supreme. Their fundamental property is the reliance on the availability of data from all known categories to induce effective classifiers. Unfortunately, data from so‐called real‐world domains sometimes do not satisfy this property, and researchers use methods such as sampling to make the data more conducive for classification. However, there are scenarios in which even such explicit methods to rectify distributions fail. In such cases, 1‐class classification algorithms become the practical alternative. Unfortunately, domain complexity severely impacts their ability to produce effective classifiers. The work in this article addresses this issue and develops a strategy that allows for 1‐class classification over complex domains. In particular, we introduce the notion of learning along the lines of underlying domain concepts; an important source of complexity in domains is the presence of subconcepts, and by learning over them explicitly rather than on the entire domain as a whole, we can produce powerful 1‐class classification systems. The level of knowledge regarding these subconcepts will naturally vary by domain, and thus, we develop 3 distinct methodologies that take the amount of domain knowledge available into account. We demonstrate these over 3 real‐world domains.

[1]  Bartosz Krawczyk,et al.  Clustering-based ensembles for one-class classification , 2014, Inf. Sci..

[2]  Caroline Petitjean,et al.  One class random forests , 2013, Pattern Recognit..

[3]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[4]  Defeng Wang,et al.  Structured One-Class Classification , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[6]  Fabio Roli,et al.  Intrusion detection in computer networks by a modular ensemble of one-class classifiers , 2008, Inf. Fusion.

[7]  B. John Oommen,et al.  On simulating episodic events against a background of noise-like non-episodic events , 2010, SummerSim.

[8]  N. Japkowicz,et al.  Summary of the Data Mining Contest for the IEEE International Conference on Data Mining, Pisa, Italy 2008. , 2008 .

[9]  Albert D. Shieh,et al.  Ensembles of One Class Support Vector Machines , 2009, MCS.

[10]  B. John Oommen,et al.  On the Pattern Recognition and Classification of Stochastically Episodic Events , 2012, Trans. Comput. Collect. Intell..

[11]  Nathalie Japkowicz,et al.  One-Class versus Binary Classification: Which and When? , 2012, 2012 11th International Conference on Machine Learning and Applications.

[12]  Satinder Singh,et al.  Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters , 2005, ACSC.

[13]  Nathalie Japkowicz,et al.  Motivating the inclusion of meteorological indicators in the CTBT feature-space , 2011, 2011 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA).

[14]  Nathalie Japkowicz,et al.  Clustering Based One-Class Classification for Compliance Verification of the Comprehensive Nuclear-Test-Ban Treaty , 2012, Canadian Conference on AI.

[15]  Benno Stein,et al.  Cluster-based one-class ensemble for classification problems in information retrieval , 2012, SIGIR '12.

[16]  Nathalie Japkowicz,et al.  Machine learning for radioxenon event classification for the Comprehensive Nuclear-Test-Ban Treaty. , 2010, Journal of environmental radioactivity.

[17]  Fabio Roli,et al.  A Modular Multiple Classifier System for the Detection of Intrusions in Computer Networks , 2003, Multiple Classifier Systems.

[18]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[19]  Maurice Milgram,et al.  Transformation Invariant Autoassociation with Application to Handwritten Character Recognition , 1994, NIPS.

[20]  José Salvador Sánchez,et al.  On the k-NN performance in a challenging scenario of imbalance and overlapping , 2008, Pattern Analysis and Applications.

[21]  Misha Denil,et al.  Overlap versus Imbalance , 2010, Canadian Conference on AI.

[22]  Gustavo E. A. P. A. Batista,et al.  Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior , 2004, MICAI.