Query-driven discovery of semantically similar substructures in heterogeneous networks

Heterogeneous information networks that contain multiple types of objects and links are ubiquitous in the real world, such as bibliographic networks, cyber-physical networks, and social media networks. Although researchers have studied various data mining tasks in information networks, interactive query-based network exploration techniques have not been addressed systematically, which, in fact, are highly desirable for exploring large-scale information networks. In this demo, we introduce and demonstrate our recent research project on query-driven discovery of semantically similar substructures in heterogeneous networks. Given a subgraph query, our system searches a given large information network and finds efficiently a list of subgraphs that are structurally identical and semantically similar. Since data mining methods are used to obtain semantically similar entities (nodes), we use discovery as a term to describe this process. In order to achieve high efficiency and scalability, we design and implement a filter-and verification search framework, which can first generate promising subgraph candidates using off line indices built by data mining results, and then verify candidates with a recursive pruning matching process. The proposed system demonstrates the effectiveness of our query-driven semantic similarity search framework and the efficiency of the proposed methodology on multiple real-world heterogeneous information networks.

[1]  Jiawei Han,et al.  On graph query optimization in large networks , 2010, Proc. VLDB Endow..

[2]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[3]  Jiawei Han,et al.  Ranking-based classification of heterogeneous information networks , 2011, KDD.

[4]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[5]  Ni Lao,et al.  Fast query execution for retrieval models based on path-constrained random walks , 2010, KDD.

[6]  Philip S. Yu,et al.  Graph Indexing: Tree + Delta >= Graph , 2007, VLDB.

[7]  Nan Li,et al.  Neighborhood based fast graph search in large networks , 2011, SIGMOD '11.

[8]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[9]  Jiawei Han,et al.  Citation Prediction in Heterogeneous Bibliographic Networks , 2012, SDM.

[10]  Jiawei Han,et al.  Geo-Friends Recommendation in GPS-based Cyber-physical Social Network , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[11]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.