Detecting Blackhole and Volcano Patterns in Directed Networks

In this paper, we formulate a novel problem for finding black hole and volcano patterns in a large directed graph. Specifically, a black hole pattern is a group which is made of a set of nodes in a way such that there are only in links to this group from the rest nodes in the graph. In contrast, a volcano pattern is a group which only has out links to the rest nodes in the graph. Both patterns can be observed in real world. For instance, in a trading network, a black hole pattern may represent a group of traders who are manipulating the market. In the paper, we first prove that the black hole mining problem is a dual problem of finding volcanoes. Therefore, we focus on finding the black hole patterns. Along this line, we design two pruning schemes to guide the black hole finding process. In the first pruning scheme, we strategically prune the search space based on a set of pattern-size-independent pruning rules and develop an iBlack hole algorithm. The second pruning scheme follows a divide-and-conquer strategy to further exploit the pruning results from the first pruning scheme. Indeed, a target directed graphs can be divided into several disconnected sub graphs by the first pruning scheme, and thus the black hole finding can be conducted in each disconnected sub graph rather than in a large graph. Based on these two pruning schemes, we also develop an iBlackhole-DC algorithm. Finally, experimental results on real-world data show that the iBlackhole-DC algorithm can be several orders of magnitude faster than the iBlackhole algorithm, which has a huge computational advantage over a brute-force method.

[1]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[2]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[4]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[5]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[6]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[7]  Bart Selman,et al.  Natural communities in large linked networks , 2003, KDD '03.

[8]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Panos M. Pardalos,et al.  Statistical analysis of financial networks , 2005, Comput. Stat. Data Anal..

[10]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[11]  Vladimir Batagelj,et al.  Exploratory Social Network Analysis with Pajek , 2005 .

[12]  Chen Wang,et al.  Scalable mining of large disk-based graph databases , 2004, KDD.

[13]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[14]  Mong-Li Lee,et al.  A Partition-Based Approach to Graph Mining , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[15]  Hui Xiong,et al.  Mining globally distributed frequent subgraphs in a single labeled graph , 2009, Data Knowl. Eng..

[16]  Kristina Lerman,et al.  Community Detection Using a Measure of Global Influence , 2008, SNAKDD.