On detecting Association-Based Clique Outliers in heterogeneous information networks

In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. People like to discover groups (or cliques) of entities linked to each other with rare and surprising associations from such networks. We define such anomalous cliques as Association-Based Clique Outliers (ABCOutliers) for heterogeneous information networks, and design effective approaches to detect them. The need to find such outlier cliques from networks can be formulated as a conjunctive select query consisting of a set of (type, predicate) pairs. Answering such conjunctive queries efficiently involves two main challenges: (1) computing all matching cliques which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the cliques. In this paper, we address these two challenges as follows. First, we introduce a new low-cost graph index to assist clique matching. Second, we define the outlierness of an association between two entities based on their attribute values and provide a methodology to efficiently compute such outliers given a conjunctive select query. Experimental results on several synthetic datasets and the Wikipedia dataset containing thousands of entities show the effectiveness of the proposed approach in computing interesting ABCOutliers.

[1]  Christos Faloutsos,et al.  Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , 2013, ASONAM 2013.

[2]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[3]  Yizhou Sun,et al.  Community Trend Outlier Detection Using Soft Temporal Pattern Mining , 2012, ECML/PKDD.

[4]  Deepayan Chakrabarti,et al.  AutoPart: Parameter-Free Graph Partitioning and Outlier Detection , 2004, PKDD.

[5]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[6]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[7]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[8]  Diane J. Cook,et al.  Graph-based anomaly detection , 2003, KDD '03.

[9]  Yizhou Sun,et al.  Integrating community matching and outlier detection for mining evolutionary community outliers , 2012, KDD.

[10]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[11]  Shamkant B. Navathe,et al.  Mining for strong negative associations in a large database of customer transactions , 1998, Proceedings 14th International Conference on Data Engineering.

[12]  Philip S. Yu,et al.  Outlier detection in graph streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[13]  Brandon Pincombea,et al.  Anomaly Detection in Time Series of Graphs using ARMA Processes , 2007 .