Detecting Blackholes and Volcanoes in Directed Networks

In this paper, we formulate a novel problem for finding blackhole and volcano patterns in a large directed graph. Specifically, a blackhole pattern is a group which is made of a set of nodes in a way such that there are only inlinks to this group from the rest nodes in the graph. In contrast, a volcano pattern is a group which only has outlinks to the rest nodes in the graph. Both patterns can be observed in real world. For instance, in a trading network, a blackhole pattern may represent a group of traders who are manipulating the market. In the paper, we first prove that the blackhole mining problem is a dual problem of finding volcanoes. Therefore, we focus on finding the blackhole patterns. Along this line, we design two pruning schemes to guide the blackhole finding process. In the first pruning scheme, we strategically prune the search space based on a set of pattern-size-independent pruning rules and develop an iBlackhole algorithm. The second pruning scheme follows a divide-and-conquer strategy to further exploit the pruning results from the first pruning scheme. Indeed, a target directed graphs can be divided into several disconnected subgraphs by the first pruning scheme, and thus the blackhole finding can be conducted in each disconnected subgraph rather than in a large graph. Based on these two pruning schemes, we also develop an iBlackhole-DC algorithm. Finally, experimental results on real-world data show that the iBlackhole-DC algorithm can be several orders of magnitude faster than the iBlackhole algorithm, which has a huge computational advantage over a brute-force method.

[1]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Jure Leskovec,et al.  The dynamics of viral marketing , 2005, EC '06.

[3]  Mong-Li Lee,et al.  A Partition-Based Approach to Graph Mining , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[4]  Bart Selman,et al.  Natural communities in large linked networks , 2003, KDD '03.

[5]  Panos M. Pardalos,et al.  Statistical analysis of financial networks , 2005, Comput. Stat. Data Anal..

[6]  Hui Xiong,et al.  Mining globally distributed frequent subgraphs in a single labeled graph , 2009, Data Knowl. Eng..

[7]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[8]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[9]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[11]  Kristina Lerman,et al.  Community Detection Using a Measure of Global Influence , 2008, SNAKDD.

[12]  Bethany S. Dohleman Exploratory social network analysis with Pajek , 2006 .

[13]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[14]  Chen Wang,et al.  Scalable mining of large disk-based graph databases , 2004, KDD.

[15]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[16]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.