Prioritized Relationship Analysis in Heterogeneous Information Networks

An increasing number of applications are modeled and analyzed in network form, where nodes represent entities of interest and edges represent interactions or relationships between entities. Commonly, such relationship analysis tools assume homogeneity in both node type and edge type. Recent research has sought to redress the assumption of homogeneity and focused on mining heterogeneous information networks (HINs) where both nodes and edges can be of different types. Building on such efforts, in this work, we articulate a novel approach for mining relationships across entities in such networks while accounting for user preference over relationship type and interestingness metric. We formalize the problem as a top-k lightest paths problem, contextualized in a real-world communication network, and seek to find the k most interesting path instances matching the preferred relationship type. Our solution, PROphetic HEuristic Algorithm for Path Searching (PRO-HEAPS), leverages a combination of novel graph preprocessing techniques, well-designed heuristics and the venerable A* search algorithm. We run our algorithm on real-world large-scale graphs and show that our algorithm significantly outperforms a wide variety of baseline approaches with speedups as large as 100X. To widen the range of applications, we also extend PRO-HEAPS to (i) support relationship analysis between two groups of entities and (ii) allow pattern path in the query to contain logical statements with operators AND, OR, NOT, and wild-card “.”. We run experiments using this generalized version of PRO-HEAPS and demonstrate that the advantage of PRO-HEAPS becomes even more pronounced for these general cases. Furthermore, we conduct a comprehensive analysis to study how the performance of PRO-HEAPS varies with respect to various attributes of the input HIN. We finally conduct a case study to demonstrate valuable applications of our algorithm.

[1]  Sreenivas Gollapudi,et al.  A sketch-based distance oracle for web-scale graphs , 2010, WSDM '10.

[2]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.

[3]  Srinivasan Parthasarathy,et al.  SEANO: Semi-supervised Embedding in Attributed Networks with Outliers , 2017, SDM.

[4]  J. Y. Yen,et al.  Finding the K Shortest Loopless Paths in a Network , 2007 .

[5]  Sherry Marcus,et al.  Graph-based technologies for intelligence analysis , 2004, CACM.

[6]  Mark J. Warshawsky,et al.  A Modern Approach , 2005 .

[7]  Beom Jun Kim,et al.  Growing scale-free networks with tunable clustering. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Roded Sharan,et al.  Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks , 2006, J. Comput. Biol..

[9]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[10]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[11]  Reynold Cheng,et al.  Discovering Meta-Paths in Large Heterogeneous Information Networks , 2015, WWW.

[12]  Srinivasan Parthasarathy,et al.  A single source k-shortest paths algorithm to infer regulatory pathways in a gene network , 2012, Bioinform..

[13]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[14]  Philip S. Yu,et al.  Semantic Path based Personalized Recommendation on Weighted Heterogeneous Information Networks , 2015, CIKM.

[15]  Amit P. Sheth,et al.  Template Based Semantic Similarity for Security Applications , 2005, ISI.

[16]  Mam Riess Jones Color Coding , 1962, Human factors.

[17]  Christos Faloutsos,et al.  Center-piece subgraphs: problem definition and fast solutions , 2006, KDD '06.

[18]  Subhash Suri,et al.  Finding the k shortest simple paths , 2007, ALENEX.

[19]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[20]  Nan Li,et al.  Neighborhood based fast graph search in large networks , 2011, SIGMOD '11.

[21]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[22]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[23]  Meng Wang,et al.  Path Pattern Query Processing on Large Graphs , 2014, 2014 IEEE Fourth International Conference on Big Data and Cloud Computing.

[24]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Christian V. Forst,et al.  k-PathA: k-shortest Path Algorithm , 2009, 2009 International Workshop on High Performance Computational Systems Biology.

[27]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[28]  John D. Lowrance,et al.  LAW: A Workbench for Approximate Pattern Matching in Relational Data , 2003, IAAI.

[29]  Christian Sommer,et al.  Shortest-path queries in static networks , 2014, ACM Comput. Surv..

[30]  Takuya Akiba,et al.  Fast exact shortest-path distance queries on large networks by pruned landmark labeling , 2013, SIGMOD '13.

[31]  Cong Yu,et al.  REX: Explaining Relationships between Entity Pairs , 2011, Proc. VLDB Endow..

[32]  Charu C. Aggarwal,et al.  NeMa: Fast Graph Search with Label Similarity , 2013, Proc. VLDB Endow..

[33]  Toshihide Ibaraki,et al.  An efficient algorithm for K shortest simple paths , 1982, Networks.

[34]  Nicos Christofides,et al.  An efficient implementation of an algorithm for finding K shortest simple paths , 1999, Networks.

[35]  S. Sudarshan,et al.  BANKS: Browsing and Keyword Searching in Relational Databases , 2002, VLDB.

[36]  Srinivasan Parthasarathy,et al.  What Links Alice and Bob?: Matching and Ranking Semantic Patterns in Heterogeneous Networks , 2016, WWW.

[37]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[38]  Ni Lao,et al.  Relational retrieval using a combination of path-constrained random walks , 2010, Machine Learning.

[39]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[40]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.

[41]  Dennis Shasha,et al.  GraphGrep: A fast and universal method for querying graphs , 2002, Object recognition supported by user interaction for service robots.

[42]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.