Detection of Cross-Channel Anomalies from Multiple Data Channels

We identify and formulate a novel problem: cross channel anomaly detection from multiple data channels. Cross channel anomalies are common amongst the individual channel anomalies, and are often portent of significant events. Using spectral approaches, we propose a two-stage detection method: anomaly detection at a single-channel level, followed by the detection of cross-channel anomalies from the amalgamation of single channel anomalies. Our mathematical analysis shows that our method is likely to reduce the false alarm rate. We demonstrate our method in two applications: document understanding with multiple text corpora, and detection of repeated anomalies in video surveillance. The experimental results consistently demonstrate the superior performance of our method compared with related state-of-art methods, including the one-class SVM and principal component pursuit. In addition, our framework can be deployed in a decentralized manner, lending itself for large scale data stream analysis.

[1]  Kai Zhang,et al.  Mining common topics from multiple asynchronous text streams , 2009, WSDM '09.

[2]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[3]  Guofei Gu,et al.  Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems , 2006, Sixth International Conference on Data Mining (ICDM'06).

[4]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[8]  A.N. Srivastava,et al.  Discovering recurring anomalies in text reports regarding complex space systems , 2005, 2005 IEEE Aerospace Conference.

[9]  Jiawei Han,et al.  Filtering and Refinement: A Two-Stage Approach for Efficient and Effective Anomaly Detection , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[10]  John Yen,et al.  Topic segmentation with shared topic detection and alignment of multiple documents , 2007, SIGIR.

[11]  Vipin Kumar,et al.  Comparative Evaluation of Anomaly Detection Techniques for Sequence Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[12]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[13]  Alan G. Hawkes,et al.  Approximating the Normal Tail , 1982 .

[14]  John Wright,et al.  Decomposing background topics from keywords by principal component pursuit , 2010, CIKM.

[15]  Yun Fu,et al.  Multiple feature fusion by subspace learning , 2008, CIVR '08.

[16]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[17]  Ling Huang,et al.  In-Network PCA and Anomaly Detection , 2006, NIPS.

[18]  Mohamed S. Kamel,et al.  Collaborative Document Clustering , 2006, SDM.

[19]  Kuan-Yu Chen,et al.  Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling , 2007, IEEE Transactions on Knowledge and Data Engineering.

[20]  Shaogang Gong,et al.  Video Behavior Profiling for Anomaly Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Thomas G. Dietterich,et al.  Probabilistic Models for Anomaly Detection in Remote Sensor Data Streams , 2012, 1206.5250.

[22]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[23]  Bart De Moor,et al.  Kernel-based Data Fusion for Machine Learning - Methods and Applications in Bioinformatics and Text Mining , 2009, Studies in Computational Intelligence.

[24]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[25]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[26]  Mohamed S. Kamel,et al.  Cooperative clustering , 2010, Pattern Recognit..

[27]  Bin Wang,et al.  A probabilistic model for retrospective news event detection , 2005, SIGIR '05.

[28]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[29]  Klaus Brinker,et al.  Any-time clustering of high frequency news streams , 2007 .

[30]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[31]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[32]  Qiang Fu,et al.  Execution Anomaly Detection in Distributed Systems through Unstructured Log Analysis , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[33]  Wolfgang Müller,et al.  Classifying Documents by Distributed P2P Clustering , 2003, GI Jahrestagung.

[34]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[35]  Richard Sproat,et al.  Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[36]  Christophe Diot,et al.  Diagnosing network-wide traffic anomalies , 2004, SIGCOMM.

[37]  Saso Dzeroski,et al.  Combining Bagging and Random Subspaces to Create Better Ensembles , 2007, IDA.