Extracting top-K interesting subgraphs with weighted query semantics

Heterogeneous information networks (HIN) contain abundant of information about entities i.e. people, places, organizations, and events etc. with their relationship. Extracting interesting information from such underlying networks is important in real world which essence can be related to subgraph search problem. Previous approaches focuses only on structural matching with naive ranking by summing up edge weights of a subgraph in the HIN. To this end, we have proposed concept of weighted query measures. Specifically, we propose two types of interestingness measurements based on weighted query semantics, Influential Edge Match (IE-Match) and Closest Match (C-Match). Moreover, we devise Destination Vertex Count (DVC), a space efficient indexing scheme to improve indexing and candidate generation process. To evaluate and show effectiveness of the proposed approach, we conduct extensive experiments on synthetic and real world datasets and present interesting real world case studies.

[1]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[2]  Ronald Fagin,et al.  Comparing top k lists , 2003, SODA '03.

[3]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[4]  Jiawei Han,et al.  Top-K interesting subgraph discovery in information networks , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[5]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[6]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[7]  Jeong-Hoon Lee,et al.  Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases , 2013, SIGMOD '13.

[8]  K. Selçuk Candan,et al.  Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs , 2007, VLDB.

[9]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[10]  Shijie Zhang,et al.  GADDI: distance index based subgraph matching in biological networks , 2009, EDBT '09.

[11]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[12]  Gerhard Weikum,et al.  Knowledge Bases in the Age of Big Data Analytics , 2014, Proc. VLDB Endow..

[13]  Jeong-Hoon Lee,et al.  An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases , 2012, Proc. VLDB Endow..

[14]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[16]  Junhu Wang,et al.  Exploiting Vertex Relationships in Speeding up Subgraph Isomorphism over Large Graphs , 2015, Proc. VLDB Endow..

[17]  Hong Cheng,et al.  Top-K structural diversity search in large networks , 2013, The VLDB Journal.