论文信息 - Subgraph Discovery in Information Networks

Subgraph Discovery in Information Networks

Top-K Interesting Subgraph Discovery in Information Networks Report Title In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. Many problems on such networks can be mapped to an underlying critical problem of discovering topK subgraphs of entities with rare and surprising associations. Answering such subgraph queries efficiently involves two main challenges: (1) computing all matching subgraphs which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the subgraphs. Previous work on the matching problem can be harnessed for a na ̈?ve ranking-after-matching solution. However, for large graphs, subgraph queries may have enormous number of matches, and so it is inefficient to compute all matches when only the top-K matches are desired. In this paper, we address the two challenges of matching and ranking in top-K subgraph discovery as follows. First, we introduce two index structures for the network: topology index, and graph maximum metapath weight index, which are both computed offline. Second, we propose novel top-K mechanisms to exploit these indexes for answering interesting subgraph queries online efficiently. Experimental results on several synthetic datasets and the DBLP and Wikipedia datasets containing thousands of entities show the efficiency and the effectiveness of the proposed approach in computing interesting subgraphs. Conference Name: Proc. 2014 IEEE Int. Conf. on Data Engineering (ICDE'14), Chicago, IL, Mar. 2014 Conference Date: March 03, 2014 Top-K Interesting Subgraph Discovery in Information Networks Manish Gupta, Jing Gao, Xifeng Yan, Hasan Cam and Jiawei Han ¶ ∗Microsoft, India. Email: gmanish@microsoft.com †State University of New York at Buffalo. Email: jing@buffalo.edu ‡University of California, Santa Barbara. Email: xyan@cs.ucsb.edu §US Army Research Lab. Email: hasan.cam.civ@mail.mil ¶University of Illinois at Urbana-Champaign. Email: hanj@cs.uiuc.edu Abstract—In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. Many problems on such networks can be mapped to an underlying critical problem of discovering topK subgraphs of entities with rare and surprising associations. Answering such subgraph queries efficiently involves two main challenges: (1) computing allmatching subgraphs which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the subgraphs. Previous work on the matching problem can be harnessed for a näıve ranking-after-matching solution. However, for large graphs, subgraph queries may have enormous number of matches, and so it is inefficient to compute all matches when only the top-K matches are desired. In this paper, we address the two challenges of matching and ranking in top-K subgraph discovery as follows. First, we introduce two index structures for the network: topology index, and graph maximum metapath weight index, which are both computed offline. Second, we propose novel top-Kmechanisms to exploit these indexes for answering interesting subgraph queries online efficiently. Experimental results on several synthetic datasets and the DBLP and Wikipedia datasets containing thousands of entities show the efficiency and the effectiveness of the proposed approach in computing interesting subgraphs.In the real world, various systems can be modeled using heterogeneous networks which consist of entities of different types. Many problems on such networks can be mapped to an underlying critical problem of discovering topK subgraphs of entities with rare and surprising associations. Answering such subgraph queries efficiently involves two main challenges: (1) computing allmatching subgraphs which satisfy the query and (2) ranking such results based on the rarity and the interestingness of the associations among entities in the subgraphs. Previous work on the matching problem can be harnessed for a näıve ranking-after-matching solution. However, for large graphs, subgraph queries may have enormous number of matches, and so it is inefficient to compute all matches when only the top-K matches are desired. In this paper, we address the two challenges of matching and ranking in top-K subgraph discovery as follows. First, we introduce two index structures for the network: topology index, and graph maximum metapath weight index, which are both computed offline. Second, we propose novel top-Kmechanisms to exploit these indexes for answering interesting subgraph queries online efficiently. Experimental results on several synthetic datasets and the DBLP and Wikipedia datasets containing thousands of entities show the efficiency and the effectiveness of the proposed approach in computing interesting subgraphs.

Jiawei Han | Xifeng Yan | Jing Gao | Hasan Cam | Manish Gupta

[1] Haixun Wang,et al. Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases , 2012, Proc. VLDB Endow..

[2] Lei Zou,et al. Top-k subgraph matching query in a large graph , 2007, PIKM '07.

[3] Hong Cheng,et al. Finding top-k similar graphs in graph databases , 2012, EDBT '12.

[4] Jiawei Han,et al. Community Distribution Outlier Detection in Heterogeneous Information Networks , 2013, ECML/PKDD.

[5] Vipin Kumar,et al. A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[6] Yizhou Sun,et al. Community Trend Outlier Detection Using Soft Temporal Pattern Mining , 2012, ECML/PKDD.

[7] Jiawei Han,et al. Top-K aggregation queries over large networks , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[8] Ronald Fagin,et al. Comparing top k lists , 2003, SODA '03.

[9] Julian R. Ullmann,et al. An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[10] Yizhou Sun,et al. Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[11] K. Selçuk Candan,et al. Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs , 2007, VLDB.

[12] Wei Jin,et al. SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs , 2010, Proc. VLDB Endow..

[13] Jiawei Han,et al. On detecting Association-Based Clique Outliers in heterogeneous information networks , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[14] Christos Faloutsos,et al. R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[15] Rada Chirkova,et al. Efficient algorithms for exact ranked twig-pattern matching over graphs , 2008, SIGMOD Conference.

[16] R. Varshney,et al. Supporting top-k join queries in relational databases , 2011 .

[17] Theodoros Lappas,et al. Finding a team of experts in social networks , 2009, KDD.

[18] Ambuj K. Singh,et al. Mining Heavy Subgraphs in Time-Evolving Networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[19] Charu C. Aggarwal,et al. Outlier Detection for Temporal Data , 2014, Outlier Detection for Temporal Data.

[20] Mehmet M. Dalkilic,et al. WIGM: Discovery of Subgraph Patterns in a Large Weighted Graph , 2012, SDM.

[21] Philip S. Yu,et al. Substructure similarity search in graph databases , 2005, SIGMOD '05.

[22] Mario Vento,et al. A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Yizhou Sun,et al. Integrating community matching and outlier detection for mining evolutionary community outliers , 2012, KDD.

[24] Jignesh M. Patel,et al. SAGA: a subgraph matching tool for biological graphs , 2007, Bioinform..

[25] Jiawei Han,et al. On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[26] Philip S. Yu,et al. Mining top-K large structural patterns in a massive network , 2011, Proc. VLDB Endow..

[27] Ambuj K. Singh,et al. GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[28] Jianzhong Li,et al. Efficient Subgraph Matching on Billion Node Graphs , 2012, Proc. VLDB Endow..

[29] Jeffrey Xu Yu,et al. Top-K Graph Pattern Matching: A Twig Query Approach , 2012, WAIM.