Proximity-Based Anomaly Detection Using Sparse Structure Learning

We consider the task of performing anomaly detection in highly noisy multivariate data. In many applications involving real-valued time-series data, such as physical sensor data and economic metrics, discovering changes and anomalies in the way variables depend on one another is of particular importance. Our goal is to robustly compute the “correlation anomaly” score of each variable by comparing the test data with reference data, even when some of the variables are highly correlated (and thus collinearity exists). To remove seeming dependencies introduced by noise, we focus on the most significant dependencies for each variable. We perform this “neighborhood selection” in an adaptive manner by fitting a sparse graphical Gaussian model. Instead of traditional covariance selection procedures, we solve this problem as maximum likelihood estimation of the precision matrix (inverse covariance matrix) under the L1 penalty. Then the anomaly score for each variable is computed by evaluating the distances between the fitted conditional distributions within the Markov blanket for that variable, for the (two) data sets to be compared. Using real-world data, we demonstrate that our matrix-based sparse structure learning approach successfully detects correlation anomalies under collinearities and heavy noise.

[1]  Hisashi Kashima,et al.  Eigenspace-based anomaly detection in computer systems , 2004, KDD.

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  N. Henze A MULTIVARIATE TWO-SAMPLE TEST BASED ON THE NUMBER OF NEAREST NEIGHBOR TYPE COINCIDENCES , 1988 .

[4]  Alexandre d'Aspremont,et al.  Convex optimization techniques for fitting sparse Gaussian graphical models , 2006, ICML.

[5]  Korbinian Strimmer,et al.  Learning causal networks from systems biology time course data: an effective model selection procedure for the vector autoregressive process , 2007, BMC Bioinformatics.

[6]  Runze Li Variable selection for high-dimensional data , 2005 .

[7]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[8]  Yiming Yang,et al.  Using Modified Lasso Regression to Learn Large Undirected Graphs in a Probabilistic Framework , 2005, AAAI.

[9]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[10]  Spiros Papadimitriou,et al.  Computing Correlation Anomaly Scores Using Stochastic Nearest Neighbors , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[11]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[12]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[13]  Philip S. Yu,et al.  Proximity Tracking on Time-Evolving Bipartite Graphs , 2008, SDM.

[14]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[15]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[16]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[17]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[18]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[19]  Yan Liu,et al.  Temporal causal modeling with graphical granger methods , 2007, KDD '07.

[20]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[21]  Mathias Drton,et al.  A SINful approach to Gaussian graphical model selection , 2005 .

[22]  Kevin P. Murphy,et al.  Modeling changing dependency structure in multivariate time series , 2007, ICML '07.

[23]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[24]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .