Fast reachability query computation on big attributed graphs

Attributed graph is widely used for modeling a variety of information networks, such as the web, sensor networks, biological networks, economic graphs, and social networks. Given the high popularity of attributed graph, in this paper, we study one of the most fundamental graph query types for attributed graph — the reachability query with attribute constraints i.e. ‘Is there a path from source to destination such that all attributes on the path satisfy given attribute constraints?’. We first introduce a new approach which takes the advantage of a ‘perfect’ hash function for compressing a multi-dimensional attribute into a unique hash value so as to bound the expected cost of secondary storage access (e.g. SDD I/O, remote data access). Then, we propose a synopsis based heuristic search technique to further reduce the CPU and secondary storage access cost. For both techniques, the index construction time and space are still linear to the topology size. An extensive experimental evaluation using real and synthetic graphs demonstrates the superiority of our proposed techniques.

[1]  Ruoming Jin,et al.  Simple, Fast, and Scalable Reachability Oracle , 2013, Proc. VLDB Endow..

[2]  Yang Xiang,et al.  Computing label-constraint reachability in graph databases , 2010, SIGMOD Conference.

[3]  Sameh Elnikety,et al.  Horton+: A Distributed System for Processing Declarative Reachability Queries over Partitioned Graphs , 2013, Proc. VLDB Endow..

[4]  Jianzhong Li,et al.  Adding regular expressions to graph reachability and pattern queries , 2011, Frontiers of Computer Science.

[5]  Li Chen,et al.  Stack-based Algorithms for Pattern Matching on DAGs , 2005, VLDB.

[6]  Divyakant Agrawal,et al.  Pagrol: Parallel graph olap over large-scale attributed graphs , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[7]  Gerhard Weikum,et al.  FERRARI: Flexible and efficient reachability range assignment for graph indexing , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[8]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[9]  Lei Zou,et al.  Efficient processing of label-constraint reachability queries in large graphs , 2014, Inf. Syst..

[10]  Marlon Dumas,et al.  Fast fully dynamic landmark-based estimation of shortest path distances in very large graphs , 2011, CIKM '11.

[11]  Mohammed J. Zaki,et al.  GRAIL: a scalable index for reachability queries in very large graphs , 2011, The VLDB Journal.

[12]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[13]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[14]  Jayanta Banerjee,et al.  A Tale of Two Graphs: Property Graphs as RDF in Oracle , 2014, EDBT.

[15]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[16]  Jeffrey Xu Yu,et al.  Reachability querying: an independent permutation labeling approach , 2014, The VLDB Journal.

[17]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[19]  James Cheng,et al.  TF-Label: a topological-folding labeling scheme for reachability querying in a large graph , 2013, SIGMOD '13.

[20]  Sherif Sakr,et al.  G-SPARQL: a hybrid engine for querying large attributed graphs , 2012, CIKM.

[21]  Rok Sosic,et al.  SNAP , 2016, ACM Trans. Intell. Syst. Technol..

[22]  Alexander Borgida,et al.  Efficient management of transitive relationships in large data and knowledge bases , 1989, SIGMOD '89.

[23]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.