Discovering Contexts and Contextual Outliers Using Random Walks in Graphs

The identifying of contextual outliers allows the discovery of anomalous behavior that other forms of outlier detection cannot find. What may appear to be normal behavior with respect to the entire data set can be shown to be anomalous by subsetting the data according to specific spatial or temporal context. However, in many real-world applications, we may not have sufficient a priori contextual information to discover these contextual outliers. This paper addresses the problem by proposing a probabilistic approach based on random walks, which can simultaneously explore meaningful contexts and score contextual outliers therein. Our approach has several advantages including producing outlier scores which can be interpreted as stationary expectations and their calculation in closed form in polynomial time. In addition, we show that point outlier detection using the stationary distribution is a special case of our approach. It allows us to find both global and contextual outliers simultaneously and to create a meaningful ranked list consisting of both types of outliers. This is a major departure from existing work where an algorithm typically identifies one type of outlier. The effectiveness of our method is justified by empirical results on real data sets, with comparison to related work.

[1]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[2]  David B. Skillicorn,et al.  Detecting Anomalies in Graphs , 2007, 2007 IEEE Intelligence and Security Informatics.

[3]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[4]  Shashi Shekhar,et al.  Detecting graph-based spatial outliers: algorithms and applications (a summary of results) , 2001, KDD '01.

[5]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[6]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[7]  Philip Chan,et al.  Learning States and Rules for Time Series Anomaly Detection , 2004, FLAIRS.

[8]  Pang-Ning Tan,et al.  Outlier Detection Using Random Walks , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[9]  Sanjay Ranka,et al.  Conditional Anomaly Detection , 2007, IEEE Transactions on Knowledge and Data Engineering.

[10]  François Fouss,et al.  The Principal Components Analysis of a Graph, and Its Relationships to Spectral Clustering , 2004, ECML.

[11]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[12]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[13]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[14]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[15]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[16]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[17]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[18]  Chang-Tien Lu,et al.  Spatial Weighted Outlier Detection , 2006, SDM.