Anomaly Detection in Large Graphs

Discovering anomalies is an important and challenging task for many settings, from network intrusion to fraud detection. However, most work to date has focus ed on clouds of multi-dimensional points, with little emphasis on graph data; even then, the fo cus is on un-weighted, node-labeled graphs. Here we propose OddBall , an algorithm to detect anomalous nodes in weighted graphs. The contributions are the following: (a) we carefully choos e features, that easily reveal nodes with strange behavior; (b) we discover several new rules (po wer laws) in density, weights, ranks and eigenvalues that seem to govern the so-called “neighbor hood graphs” and we show how to use them for anomaly detection; (c) we empirically show that our method scales linearly with the number of edges in the graph, and (d) we report experiments on many real graphs with up to 1.5 million nodes, whereOddBall indeed spots unusual nodes that agree with intuition.

[1]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[2]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[3]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[4]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[5]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[6]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[7]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[8]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[9]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[10]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[11]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[12]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[13]  Alexander S. Szalay,et al.  Very Fast Outlier Detection in Large Multidimensional Data Sets , 2002, DMKD.

[14]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[15]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[16]  Theodore Johnson,et al.  Exploratory Data Mining and Data Cleaning , 2003 .

[17]  Deepayan Chakrabarti,et al.  AutoPart: Parameter-Free Graph Partitioning and Outlier Detection , 2004, PKDD.

[18]  Srinivasan Parthasarathy,et al.  LOADED: link-based outlier and anomaly detection in evolving data sets , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[19]  James H. Garrett,et al.  Web-Vacuum: Web-Based Environment for Automated Assessment of Civil Infrastructure Data , 2005 .

[20]  Chao Liu,et al.  Mining Behavior Graphs for "Backtrace" of Noncrashing Bugs , 2005, SDM.

[21]  Jimeng Sun,et al.  Neighborhood formation and anomaly detection in bipartite graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[22]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[23]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[24]  Christos Faloutsos,et al.  Detecting Fraudulent Personalities in Networks of Online Auctioneers , 2006, PKDD.

[25]  Stephen D. Bay,et al.  Large Scale Detection of Irregularities in Accounting Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[26]  J. Leskovec,et al.  Cascading Behavior in Large Blog Graphs Patterns and a model , 2006 .

[27]  Lawrence B. Holder,et al.  Detecting Anomalies in Cargo Using Graph Properties , 2006, ISI.

[28]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[29]  Lawrence B. Holder,et al.  Discovering Structural Anomalies in Graph-Based Data , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[30]  Christos Faloutsos,et al.  Weighted Graphs and Disconnected Components , 2008 .

[31]  Christos Faloutsos,et al.  RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[32]  Luca Becchetti,et al.  Efficient semi-streaming algorithms for local triangle counting in massive graphs , 2008, KDD.

[33]  Christos Faloutsos,et al.  Mobile call graphs: beyond power-law and lognormal distributions , 2008, KDD.

[34]  Pang-Ning Tan,et al.  Outrank: a Graph-Based Outlier Detection Framework Using Random Walk , 2008, Int. J. Artif. Intell. Tools.

[35]  Charalampos E. Tsourakakis Fast Counting of Triangles in Large Real Networks : Algorithms and Laws , 2008 .

[36]  Christos Faloutsos,et al.  Statistical Properties of Social Networks , 2011, Social Network Data Analytics.