On the Foundations of Adversarial Single-Class Classification

Motivated by authentication, intrusion and spam detection applications we consider single-class classification (SCC) as a two-person game between the learner and an adversary. In this game the learner has a sample from a target distribution and the goal is to construct a classifier capable of distinguishing observations from the target distribution from observations emitted from an unknown other distribution. The ideal SCC classifier must guarantee a given tolerance for the false-positive error (false alarm rate) while minimizing the false negative error (intruder pass rate). Viewing SCC as a two-person zero-sum game we identify both deterministic and randomized optimal classification strategies for different game variants. We demonstrate that randomized classification can provide a significant advantage. In the deterministic setting we show how to reduce SCC to two-class classification where in the two-class problem the other class is a synthetically generated distribution. We provide an efficient and practical algorithm for constructing and solving the two class problem. The algorithm distinguishes low density regions of the target distribution and is shown to be consistent.

[1]  Shie-Jue Lee,et al.  BOOSTING ONE-CLASS SUPPORT VECTOR MACHINES FOR MULTI-CLASS CLASSIFICATION , 2009, Appl. Artif. Intell..

[2]  Clayton Scott,et al.  Performance Measures for Neyman–Pearson Classification , 2007, IEEE Transactions on Information Theory.

[3]  Jacob Ziv,et al.  On classification with empirically observed statistics and universal data compression , 1988, IEEE Trans. Inf. Theory.

[4]  Adam Kowalczyk,et al.  Exploring Fringe Settings of SVMs for Classification , 2003, PKDD.

[5]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[6]  T. C. Minter,et al.  Single-Class Classification , 1975 .

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  Ian H. Witten,et al.  One-Class Classification by Combining Density and Class Probability Estimation , 2008, ECML/PKDD.

[9]  Neri Merhav,et al.  A measure of relative entropy between individual sequences with application to universal classification , 1993, IEEE Trans. Inf. Theory.

[10]  B. Cadre Kernel estimation of density level sets , 2005, math/0501221.

[11]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[12]  P. Rigollet,et al.  Optimal rates for plug-in estimators of density level sets , 2006, math/0611473.

[13]  A. Cuevas,et al.  A plug-in approach to support estimation , 1997 .

[14]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[15]  David A. McAllester,et al.  On the Convergence Rate of Good-Turing Estimators , 2000, COLT.

[16]  Robert D. Nowak,et al.  Learning Minimum Volume Sets , 2005, J. Mach. Learn. Res..

[17]  K. Ramanan,et al.  Concentration Inequalities for Dependent Random Variables via the Martingale Method , 2006, math/0609835.

[18]  A. Tsybakov On nonparametric estimation of density level sets , 1997 .

[19]  Michael Gutman,et al.  Asymptotically optimal classification for multiple tests with empirically observed statistics , 1989, IEEE Trans. Inf. Theory.

[20]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[21]  Shai Ben-David,et al.  Learning Distributions by Their Density Levels: A Paradigm for Learning without a Teacher , 1997, J. Comput. Syst. Sci..

[22]  Don R. Hush,et al.  A Classification Framework for Anomaly Detection , 2005, J. Mach. Learn. Res..

[23]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[24]  Malcolm I. Heywood,et al.  One-Class Genetic Programming , 2009, EuroGP.

[25]  David Pointcheval,et al.  Multi-factor Authenticated Key Exchange , 2008, ACNS.

[26]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[27]  András Kocsor,et al.  Counter-Example Generation-Based One-Class Classification , 2007, ECML.

[28]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[29]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[30]  Ran El-Yaniv,et al.  Towards Behaviometric Security Systems: Learning to Identify a Typist , 2003, PKDD.

[31]  Gunnar Rätsch,et al.  Constructing Boosting Algorithms from SVMs: An Application to One-Class Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Don R. Hush,et al.  Density Level Detection is Classification , 2004, NIPS.

[33]  Michael I. Jordan,et al.  Robust Novelty Detection with Single-Class MPM , 2002, NIPS.

[34]  Andreas Christmann,et al.  Consistency of kernel-based quantile regression , 2008 .

[35]  Gábor Lugosi,et al.  Ranking and Scoring Using Empirical Risk Minimization , 2005, COLT.

[36]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[37]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[38]  Richard Baraniuk,et al.  Learning Minimum Volume Sets with Support Vector Machines , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[39]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[40]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[41]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[42]  Ran El-Yaniv,et al.  Optimal Single-Class Classification Strategies , 2006, NIPS.

[43]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[44]  S. Panchapakesan,et al.  Inference about the Change-Point in a Sequence of Random Variables: A Selection Approach , 1988 .

[45]  Robert P. W. Duin,et al.  Uniform Object Generation for Optimizing One-class Classifiers , 2002, J. Mach. Learn. Res..

[46]  Jean-Philippe Vert,et al.  Consistency and Convergence Rates of One-Class SVMs and Related Algorithms , 2006, J. Mach. Learn. Res..

[47]  Ilya Molchanov,et al.  Empirical estimation of distribution quantiles of random closed sets , 1991 .

[48]  E. Hannan,et al.  On stochastic complexity and nonparametric density estimation , 1988 .

[49]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[50]  Jörg Rech,et al.  Knowledge Discovery in Databases , 2001, Künstliche Intell..

[51]  H. Scheffé A Useful Convergence Theorem for Probability Distributions , 1947 .

[52]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[53]  L. Devroye,et al.  Detection of Abnormal Behavior Via Nonparametric Estimation of the Support , 1980 .

[54]  L. Devroye,et al.  Distribution and Density Estimation , 2002 .

[55]  David V. Hinkley,et al.  Inference about the change-point in a sequence of binomial variables , 1970 .

[56]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[57]  Hwanjo Yu,et al.  Single-Class Classification with Mapping Convergence , 2005, Machine Learning.

[58]  Robert P. W. Duin,et al.  Minimum spanning tree based one-class classifier , 2009, Neurocomputing.