Nonparametric Detection of Anomalous Data Streams

A nonparametric anomalous hypothesis testing problem is investigated, in which there are totally <inline-formula> <tex-math notation="LaTeX">$n$</tex-math></inline-formula> observed sequences out of which <inline-formula> <tex-math notation="LaTeX">$s$</tex-math></inline-formula> anomalous sequences are to be detected. Each typical sequence consists of <inline-formula><tex-math notation="LaTeX">$m$</tex-math></inline-formula> independent and identically distributed (i.i.d.) samples drawn from a distribution <inline-formula><tex-math notation="LaTeX">$p$ </tex-math></inline-formula>, whereas each anomalous sequence consists of <inline-formula><tex-math notation="LaTeX"> $m$</tex-math></inline-formula> i.i.d. samples drawn from a distribution <inline-formula><tex-math notation="LaTeX">$q$ </tex-math></inline-formula> that is distinct from <inline-formula><tex-math notation="LaTeX">$p$</tex-math> </inline-formula>. The distributions <inline-formula><tex-math notation="LaTeX">$p$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$q$</tex-math></inline-formula> are assumed to be unknown in advance. Distribution-free tests are constructed by using the maximum mean discrepancy as the metric, which is based on mean embeddings of distributions into a reproducing kernel Hilbert space. The probability of error is bounded as a function of the sample size <inline-formula><tex-math notation="LaTeX">$m$</tex-math></inline-formula>, the number <inline-formula><tex-math notation="LaTeX">$s$</tex-math></inline-formula> of anomalous sequences, and the number <inline-formula><tex-math notation="LaTeX">$n$</tex-math></inline-formula> of sequences. It is shown that with <inline-formula><tex-math notation="LaTeX">$s$</tex-math></inline-formula> known, the constructed test is exponentially consistent if <inline-formula><tex-math notation="LaTeX">$m$</tex-math></inline-formula> is greater than a constant factor of <inline-formula><tex-math notation="LaTeX">$\log n$</tex-math></inline-formula>, for any <inline-formula> <tex-math notation="LaTeX">$p$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$q$</tex-math> </inline-formula>, whereas with <inline-formula><tex-math notation="LaTeX">$s$</tex-math></inline-formula> unknown, <inline-formula><tex-math notation="LaTeX">$m$</tex-math></inline-formula> should have an order strictly greater than <inline-formula><tex-math notation="LaTeX">$\log n$</tex-math></inline-formula>. Furthermore, it is shown that no test can be consistent for arbitrary <inline-formula><tex-math notation="LaTeX">$p$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$q$</tex-math></inline-formula> if <inline-formula> <tex-math notation="LaTeX">$m$</tex-math></inline-formula> is less than a constant factor of <inline-formula> <tex-math notation="LaTeX">$\log n$</tex-math></inline-formula>. Thus, the order-level optimality of the proposed test is established. Numerical results are provided to demonstrate that the proposed tests outperform (or perform as well as) tests based on other competitive approaches under various cases.

[1]  H. Vincent Poor,et al.  Quickest Search Over Multiple Sequences , 2011, IEEE Transactions on Information Theory.

[2]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[3]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[4]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[5]  Arthur Gretton,et al.  Fast Two-Sample Testing with Analytic Representations of Probability Measures , 2015, NIPS.

[6]  Sirin Nitinawarat,et al.  Universal outlier hypothesis testing , 2013, 2013 IEEE International Symposium on Information Theory.

[7]  P. Hall On the rate of convergence of normal extremes , 1979 .

[8]  Bernhard Schölkopf,et al.  One-Class Support Measure Machines for Group Anomaly Detection , 2013, UAI.

[9]  Venkatesh Saligrama,et al.  Anomaly Detection with Score functions based on Nearest Neighbor Graphs , 2009, NIPS.

[10]  Alexander J. Smola,et al.  Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[11]  Alfred O. Hero,et al.  Geometric entropy minimization (GEM) for anomaly detection and localization , 2006, NIPS.

[12]  H. Vincent Poor,et al.  Quick Search for Rare Events , 2012, IEEE Transactions on Information Theory.

[13]  Alfred O. Hero,et al.  Asymptotic theory of greedy approximations to minimal k-point random graphs , 1999, IEEE Trans. Inf. Theory.

[14]  Takafumi Kanamori,et al.  $f$ -Divergence Estimation and Two-Sample Homogeneity Test Under Semiparametric Density-Ratio Models , 2010, IEEE Transactions on Information Theory.

[15]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[16]  H. Vincent Poor,et al.  Unsupervised nonparametric anomaly detection: A kernel method , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[17]  Zaïd Harchaoui,et al.  Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[18]  P. Hall,et al.  Permutation tests for equality of distributions in high‐dimensional settings , 2002 .

[19]  Bernhard Schölkopf,et al.  Injective Hilbert Space Embeddings of Probability Measures , 2008, COLT.

[20]  Winston Khoon Guan Seah,et al.  Rare Event Detection and Propagation in Wireless Sensor Networks , 2016, ACM Comput. Surv..

[21]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[22]  Yingbin Liang,et al.  Data-Driven Approaches for Detecting and Identifying Anomalous Data Streams , 2018, Signal Processing and Machine Learning for Biomedical Big Data.

[23]  Zoran Obradovic,et al.  Detection of Underrepresented Biological Sequences using Class-Conditional Distribution Models , 2003, SDM.

[24]  T. McMahon,et al.  Updated world map of the Köppen-Geiger climate classification , 2007 .

[25]  Shaofeng Zou,et al.  Linear-complexity exponentially-consistent tests for universal outlying sequence detection , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[26]  Barnabás Póczos,et al.  Group Anomaly Detection using Flexible Genre Models , 2011, NIPS.

[27]  Heiko Hoffmann,et al.  Kernel PCA for novelty detection , 2007, Pattern Recognit..

[28]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[29]  Shaofeng Zou,et al.  Linear Complexity Exponentially Consistent Tests for Outlying Sequence Detection , 2017, ArXiv.

[30]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[31]  Barnabás Póczos,et al.  Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions , 2011, UAI.

[32]  Barnabás Póczos,et al.  Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing , 2015, ArXiv.

[33]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[34]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .