Expected similarity estimation for large-scale batch and streaming anomaly detection

We present a novel algorithm for anomaly detection on very large datasets and data streams. The method, named EXPected Similarity Estimation (expose), is kernel-based and able to efficiently compute the similarity between new data points and the distribution of regular data. The estimator is formulated as an inner product with a reproducing kernel Hilbert space embedding and makes no assumption about the type or shape of the underlying data distribution. We show that offline (batch) learning with exposecan be done in linear time and online (incremental) learning takes constant time per instance and model update. Furthermore, exposecan make predictions in constant time, while it requires only constant memory. In addition, we propose different methodologies for concept drift adaptation on evolving data streams. On several real datasets we demonstrate that our approach can compete with state of the art algorithms for anomaly detection while being an order of magnitude faster than most other approaches.

[1]  Fabio Tozeto Ramos,et al.  On Integrated Clustering and Outlier Detection , 2014, NIPS.

[2]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[3]  ShimKyuseok,et al.  Efficient algorithms for mining outliers from large data sets , 2000 .

[4]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[5]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[6]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[7]  Gert Vegter,et al.  In handbook of discrete and computational geometry , 1997 .

[8]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[9]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[10]  Svetha Venkatesh,et al.  Using multiple windows to track concept drift , 2004, Intell. Data Anal..

[11]  Michael J. Pazzani,et al.  Knowledge discovery from data? , 2000, IEEE Intell. Syst..

[12]  Barnabás Póczos,et al.  Hierarchical Probabilistic Models for Group Anomaly Detection , 2011, AISTATS.

[13]  Le Gruenwald,et al.  Research issues in outlier detection for data streams , 2014, SKDD.

[14]  Vipin Kumar,et al.  Parallel and Distributed Computing for Cybersecurity , 2005, IEEE Distributed Syst. Online.

[15]  Yannis Manolopoulos,et al.  Continuous monitoring of distance-based outliers over data streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[18]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  OLINDDA: a cluster-based approach for detecting novelty and concept drift in data streams , 2007, SAC '07.

[19]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[20]  Andrew Zisserman,et al.  Efficient Additive Kernels via Explicit Feature Maps , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[22]  Markus Schneider Probability Inequalities for Kernel Embeddings in Sampling without Replacement , 2016, AISTATS.

[23]  M LazarescuMihai,et al.  Using multiple windows to track concept drift , 2004 .

[24]  Arthur Gretton,et al.  On-line one-class support vector machines. An application to signal segmentation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[25]  Geoff Hulten,et al.  Catching up with the Data: Research Issues in Mining Data Streams , 2001, DMKD.

[26]  Ingo Steinwart,et al.  Sparseness of Support Vector Machines , 2003, J. Mach. Learn. Res..

[27]  Markus Schneider,et al.  Expected similarity estimation for large scale anomaly detection , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[28]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[29]  Harish Karnick,et al.  Random Feature Maps for Dot Product Kernels , 2012, AISTATS.

[30]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[31]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Novelty detection algorithm for data streams multi-class problems , 2013, SAC '13.

[32]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[33]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[34]  Kim C. Border,et al.  Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[35]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[36]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[37]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[38]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[39]  Kai Ming Ting,et al.  Fast Anomaly Detection for Streaming Data , 2011, IJCAI.

[40]  P. Sajda,et al.  Detection, synthesis and compression in mammographic image analysis with a hierarchical image probability model , 2001, Proceedings IEEE Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA 2001).

[41]  Antonio Torralba,et al.  HOGgles: Visualizing Object Detection Features , 2013, 2013 IEEE International Conference on Computer Vision.

[42]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[43]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[44]  David M. J. Tax,et al.  One-class classification , 2001 .

[45]  Bin Zhang,et al.  A Probabilistic Fault Detection Approach: Application to Bearing Fault Detection , 2011, IEEE Transactions on Industrial Electronics.

[46]  Geoffrey I. Webb,et al.  MultiBoosting: A Technique for Combining Boosting and Wagging , 2000, Machine Learning.

[47]  Cristian Sminchisescu,et al.  Random Fourier Approximations for Skewed Multiplicative Histogram Kernels , 2010, DAGM-Symposium.

[48]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[49]  T NgRaymond,et al.  Distance-based outliers: algorithms and applications , 2000, VLDB 2000.

[50]  Philip S. Yu,et al.  RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection , 2014, 2014 IEEE International Conference on Data Mining.

[51]  Fabrizio Angiulli,et al.  Detecting distance-based outliers in streams of data , 2007, CIKM '07.

[52]  O. SIAMJ.,et al.  ON THE CONVERGENCE OF PATTERN SEARCH ALGORITHMS , 1997 .

[53]  A. P. Dawid,et al.  Present position and potential developments: some personal views , 1984 .

[54]  Virginia Torczon,et al.  On the Convergence of Pattern Search Algorithms , 1997, SIAM J. Optim..

[55]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[56]  Fabrizio Angiulli,et al.  Distance-based outlier queries in data streams: the novel task and algorithms , 2010, Data Mining and Knowledge Discovery.

[57]  AI Koan,et al.  Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.

[58]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[59]  Kenji Fukumizu,et al.  Universality, Characteristic Kernels and RKHS Embedding of Measures , 2010, J. Mach. Learn. Res..

[60]  O. Kallenberg Foundations of Modern Probability , 2021, Probability Theory and Stochastic Modelling.

[61]  Shen-Shyang Ho,et al.  A martingale framework for concept change detection in time-varying data streams , 2005, ICML.

[62]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[63]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[64]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[65]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[66]  Bernd Freisleben,et al.  CARDWATCH: a neural network based database mining system for credit card fraud detection , 1997, Proceedings of the IEEE/IAFE 1997 Computational Intelligence for Financial Engineering (CIFEr).

[67]  Jiawei Han,et al.  Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.

[68]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[69]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[70]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[71]  Alexander J. Smola,et al.  Fastfood: Approximate Kernel Expansions in Loglinear Time , 2014, ArXiv.

[72]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[73]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[74]  I. Pinelis AN APPROACH TO INEQUALITIES FOR THE DISTRIBUTIONS OF INFINITE-DIMENSIONAL MARTINGALES , 1992 .