Diversified top-k search with relaxed graph simulation

Graph pattern matching has been widely used in a broad spectrum of real-world applications, and it has been the subject of several investigations, mainly of its importance and use. In this context, different models along with their appropriate algorithms have been proposed. However, in addition to the excessive processing costs, most of the existing models suffer from the failing query problem due to their limitations on finding meaningful matches. Also, in some scenarios, the number of matches may be enormous, making the inspection a daunting task. In this work, we introduce a new model for graph pattern matching, called relaxed graph simulation (RGS), allowing the relaxation of queries to identify more significant matches and to avoid the empty-set answer problem. We then formalize and study the top-k matching problem based on two function classes, relevance and diversity, for ranking the matches with respect to the proposed model. We also formalize and investigate the diversified top-k matching problem, and we propose a diversification function to balance relevance and diversity. Nonetheless, we provide efficient algorithms based on optimization strategies to compute the top-k and the diversified top-k matches according to the RGS model. Our experimental results, on four real datasets, demonstrate both the effectiveness and the efficiency of the proposed approaches.

[1]  Jeong-Hoon Lee,et al.  An In-depth Comparison of Subgraph Isomorphism Algorithms in Graph Databases , 2012, Proc. VLDB Endow..

[2]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[3]  Xin Wang,et al.  Querying big graphs within bounded resources , 2014, SIGMOD Conference.

[4]  Nick Koudas,et al.  Efficient diversity-aware search , 2011, SIGMOD '11.

[5]  Sreenivas Gollapudi,et al.  An axiomatic approach for result diversification , 2009, WWW '09.

[6]  Lei Zou,et al.  Top-k subgraph matching query in a large graph , 2007, PIKM '07.

[7]  Rada Chirkova,et al.  Efficient algorithms for exact ranked twig-pattern matching over graphs , 2008, SIGMOD Conference.

[8]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[9]  Wei Jin,et al.  SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs , 2010, Proc. VLDB Endow..

[10]  L. Freeman,et al.  The Development of Social Network Analysis: A Study in the Sociology of Science , 2005 .

[11]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[12]  Lixia Zhang,et al.  PRS: Parallel Relaxation Simulation for Massive Graphs , 2016, Comput. J..

[13]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[14]  Tianyu Wo,et al.  Strong simulation , 2014, ACM Trans. Database Syst..

[15]  Zhengwei Yang,et al.  Diversified Top-k Subgraph Querying in a Large Graph , 2016, SIGMOD Conference.

[16]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[17]  Evaggelia Pitoura,et al.  DisC diversity: result diversification based on dissimilarity and coverage , 2012, Proc. VLDB Endow..

[18]  FanWenfei,et al.  Diversified top-k graph pattern matching , 2013, VLDB 2013.

[19]  Divesh Srivastava,et al.  On query result diversification , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[20]  Denilson Barbosa,et al.  TASM: Top-k Approximate Subtree Matching , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[21]  M. Kuby Programming Models for Facility Dispersion: The p‐Dispersion and Maxisum Dispersion Problems , 2010 .

[22]  Peter Fankhauser,et al.  DivQ: diversification for keyword search over structured databases , 2010, SIGIR.

[23]  Andreas Harth,et al.  Top-k Linked Data Query Processing , 2012, ESWC.

[24]  Bin Fan,et al.  Cuckoo Filter: Practically Better Than Bloom , 2014, CoNEXT.

[25]  Tianyu Wo,et al.  Capturing Topology in Graph Pattern Matching , 2011, Proc. VLDB Endow..

[26]  Yuli Ye,et al.  Max-Sum diversification, monotone submodular functions and dynamic updates , 2012, PODS '12.

[27]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[28]  Meng Xu,et al.  NetAlign: a web-based tool for comparison of protein interaction networks , 2006, Bioinform..

[29]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[30]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[31]  Aoying Zhou,et al.  Finding Top-k Shortest Paths with Diversity , 2018, IEEE Transactions on Knowledge and Data Engineering.

[32]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[33]  Meredith Ringel Morris,et al.  What do people ask their social networks, and why?: a survey study of status message q&a behavior , 2010, CHI.

[34]  Jianzhong Li,et al.  Graph pattern matching , 2010, Proc. VLDB Endow..

[35]  Shuai Ma,et al.  Relaxing Graph Pattern Matching With Explanations , 2017, CIKM.

[36]  Roded Sharan,et al.  PathBLAST: a tool for alignment of protein interaction networks , 2004, Nucleic Acids Res..

[37]  Wenfei Fan,et al.  Graph pattern matching revised for social network analysis , 2012, ICDT '12.

[38]  Ajay Mehra The Development of Social Network Analysis: A Study in the Sociology of Science , 2005 .

[39]  Jianzhong Li,et al.  Adding regular expressions to graph reachability and pattern queries , 2011, Frontiers of Computer Science.

[40]  Xin Wang,et al.  Query preserving graph compression , 2012, SIGMOD Conference.