Transductive anomaly detection

One formulation of the anomaly detection problem is to build a detector based on a training sample consisting only on nominal data. The standard approach to this problem has been to declare anomalies where the nominal density is low, which reduces the problem to density level set estimation. This approach is inductive in the sense that the detector is constructed before any test data are observed. In this paper, we consider the transductive setting where the unlabeled and possibly contaminated test sample is also available at learning time. We argue that anomaly detection in this transductive setting is naturally solved by a general reduction to a binary classification problem. In particular, an anomaly detector with a desired false positive rate can be achieved through a reduction to Neyman-Pearson classification. Unlike the inductive approach, the transductive approach yields detectors that are optimal (e.g., statistically consistent) regardless of the distribution on anomalies. Therefore, in anomaly detection, unlabeled data can have a substantial impact on the theoretical properties of the decision rule.

[1]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[2]  Luc Devroye,et al.  Any Discrimination Rule Can Have an Arbitrarily Bad Probability of Error for Finite Sample Size , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Frann Cois Denis,et al.  PAC Learning from Positive Statistical Queries , 1998, ALT.

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Rémi Gilleron,et al.  Learning from positive and unlabeled examples , 2000, Theor. Comput. Sci..

[6]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[7]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[8]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[9]  Bing Liu,et al.  Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression , 2003, ICML.

[10]  James Theiler,et al.  Resampling approach for anomaly detection in multispectral images , 2003, SPIE Defense + Commercial Sensing.

[11]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[12]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[13]  Hendrik P. Lopuhaa,et al.  The behavior of the NPMLE of a decreasing density near the boundaries of the support , 2006, math/0607015.

[14]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[15]  Thomas P. Hayes,et al.  Error limiting reductions between classification tasks , 2005, ICML.

[16]  Don R. Hush,et al.  A Classification Framework for Anomaly Detection , 2005, J. Mach. Learn. Res..

[17]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[18]  Robert D. Nowak,et al.  A Neyman-Pearson approach to statistical learning , 2005, IEEE Transactions on Information Theory.

[19]  Jean-Philippe Vert,et al.  Consistency and Convergence Rates of One-Class SVMs and Related Algorithms , 2006, J. Mach. Learn. Res..

[20]  Ran El-Yaniv,et al.  Optimal Single-Class Classification Strategies , 2006, NIPS.

[21]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[22]  N. Meinshausen,et al.  Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses , 2005, math/0501289.

[23]  Alfred O. Hero,et al.  Geometric entropy minimization (GEM) for anomaly detection and localization , 2006, NIPS.