What Links Alice and Bob?: Matching and Ranking Semantic Patterns in Heterogeneous Networks

An increasing number of applications are modeled and analyzed in network form, where nodes represent entities of interest and edges represent interactions or relationships between entities. Commonly, such relationship analysis tools assume homogeneity in both node type and edge type. Recent research has sought to redress the assumption of homogeneity and focused on mining heterogeneous information networks (HINs) where both nodes and edges can be of different types. Building on such efforts, in this work we articulate a novel approach for mining relationships across entities in such networks while accounting for user preference (prioritization) over relationship type and interestingness metric. We formalize the problem as a top-$k$ lightest paths problem, contextualized in a real-world communication network, and seek to find the $k$ most interesting path instances matching the preferred relationship type. Our solution, PROphetic HEuristic Algorithm for Path Searching (PRO-HEAPS), leverages a combination of novel graph preprocessing techniques, well designed heuristics and the venerable A* search algorithm. We run our algorithm on real-world large-scale graphs and show that our algorithm significantly outperforms a wide variety of baseline approaches with speedups as large as 100X. We also conduct a case study and demonstrate valuable applications of our algorithm.

[1]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[2]  Nan Li,et al.  Neighborhood based fast graph search in large networks , 2011, SIGMOD '11.

[3]  Sherry Marcus,et al.  Graph-based technologies for intelligence analysis , 2004, CACM.

[4]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[5]  Toshihide Ibaraki,et al.  An efficient algorithm for K shortest simple paths , 1982, Networks.

[6]  B Gallagher,et al.  The State of the Art in Graph-Based Pattern Matching , 2006 .

[7]  Takuya Akiba,et al.  Fast exact shortest-path distance queries on large networks by pruned landmark labeling , 2013, SIGMOD '13.

[8]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[9]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[10]  Jian Pei,et al.  A brief survey on anonymization techniques for privacy preserving publishing of social network data , 2008, SKDD.

[11]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.

[12]  Dennis Shasha,et al.  GraphGrep: A fast and universal method for querying graphs , 2002, Object recognition supported by user interaction for service robots.

[13]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[14]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[15]  Subhash Suri,et al.  Finding the k Shortest Simple Paths: A New Algorithm and Its Implementation. , 2003 .

[16]  Meng Wang,et al.  Path Pattern Query Processing on Large Graphs , 2014, 2014 IEEE Fourth International Conference on Big Data and Cloud Computing.

[17]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[18]  Cong Yu,et al.  REX: Explaining Relationships between Entity Pairs , 2011, Proc. VLDB Endow..

[19]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[20]  Christian V. Forst,et al.  k-PathA: k-shortest Path Algorithm , 2009, 2009 International Workshop on High Performance Computational Systems Biology.

[21]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[22]  John D. Lowrance,et al.  LAW: A Workbench for Approximate Pattern Matching in Relational Data , 2003, IAAI.

[23]  Charu C. Aggarwal,et al.  NeMa: Fast Graph Search with Label Similarity , 2013, Proc. VLDB Endow..

[24]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[25]  Hillol Kargupta,et al.  Privacy-Preserving Data Analysis on Graphs and Social Networks , 2008, Next Generation of Data Mining.

[26]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[27]  Nicos Christofides,et al.  An efficient implementation of an algorithm for finding K shortest simple paths , 1999, Networks.

[28]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[29]  J. Y. Yen,et al.  Finding the K Shortest Loopless Paths in a Network , 2007 .

[30]  Roded Sharan,et al.  Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks , 2006, J. Comput. Biol..

[31]  Srinivasan Parthasarathy,et al.  A single source k-shortest paths algorithm to infer regulatory pathways in a gene network , 2012, Bioinform..

[32]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[33]  Reynold Cheng,et al.  Discovering Meta-Paths in Large Heterogeneous Information Networks , 2015, WWW.

[34]  Amit P. Sheth,et al.  Template Based Semantic Similarity for Security Applications , 2005, ISI.

[35]  Mam Riess Jones Color Coding , 1962, Human factors.