Query-based graph cuboid outlier detection

Various projections or views of a heterogeneous information network can be modeled using the graph OLAP (On-line Analytical Processing) framework for effective decision making. Detecting anomalous projections of the network can help the analysts identify regions of interest from the graph specific to the projection attribute. While most previous studies on outlier detection in graphs deal with outlier nodes, edges or subgraphs, we are the first to propose detection of graph cuboid outliers. Further we perform this detection in a query sensitive way. Given a general subgraph query on a heterogeneous network, we study the problem of finding outlier cuboids from the graph OLAP lattice. A Graph Cuboid Outlier (GCOutlier) is a cuboid with exceptionally high density of matches for the query. The GCOutlier detection task is clearly challenging because: (1) finding matches for the query (subgraph isomorphism) is NP-hard; (2) number of matches for the query can be very high; and (3) number of cuboids can be large. We provide an approximate solution to the problem by computing only a fraction of the total matches originating from a select set of candidate nodes and including a select set of edges, chosen smartly. We perform extensive experiments on synthetic datasets to showcase the execution time versus accuracy trade-off. Experiments on real datasets like Four Area and Delicious containing thousands of nodes reveal interesting GCOutliers.

[1]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[2]  Hisashi Kashima,et al.  Eigenspace-based anomaly detection in computer systems , 2004, KDD.

[3]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[4]  Philip S. Yu,et al.  Outlier detection in graph streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[5]  Yizhou Sun,et al.  Community Trend Outlier Detection Using Soft Temporal Pattern Mining , 2012, ECML/PKDD.

[6]  Jiawei Han,et al.  On detecting Association-Based Clique Outliers in heterogeneous information networks , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[7]  Brandon Pincombea,et al.  Anomaly Detection in Time Series of Graphs using ARMA Processes , 2007 .

[8]  Charu C. Aggarwal,et al.  On clustering heterogeneous social media objects with outlier links , 2012, WSDM '12.

[9]  Jiawei Han,et al.  Top-K interesting subgraph discovery in information networks , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[10]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[11]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data , 2014, Outlier Detection for Temporal Data.

[12]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[13]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[14]  Jiawei Han,et al.  Local Learning for Mining Outlier Subgraphs from Network Datasets , 2014, SDM.

[15]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[16]  Jiawei Han,et al.  Mining Query-Based Subnetwork Outliers in Heterogeneous Information Networks , 2014, 2014 IEEE International Conference on Data Mining.

[17]  Christos Faloutsos,et al.  Metric forensics: a multi-level approach for mining volatile graphs , 2010, KDD.

[18]  Jiawei Han,et al.  Community Distribution Outlier Detection in Heterogeneous Information Networks , 2013, ECML/PKDD.

[19]  Yizhou Sun,et al.  Integrating community matching and outlier detection for mining evolutionary community outliers , 2012, KDD.

[20]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[21]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[22]  Philip S. Yu,et al.  Graph OLAP: Towards Online Analytical Processing on Graphs , 2008, 2008 Eighth IEEE International Conference on Data Mining.