论文信息 - External memory K-bisimulation reduction of big graphs

External memory K-bisimulation reduction of big graphs

In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bisimulation is a robust notion of node equivalence which intuitively groups together nodes in a graph which share fundamental structural features. k-bisimulation is the standard variant of bisimulation where the topological features of nodes are only considered within a local neighborhood of radius k > 0. The I/O cost of our partition construction algorithm is bounded by O(k · sort}(|Et|) + k · scan(|Nt|) + sort(|Nt|)), while our maintenance algorithms are bounded by O(k · sort}(|Et|) + k · scan(|Nt|). The space complexity bounds are O(|Nt|+|Et|)$ and O(k · |Nt|+k ·|Et|), resp. Here, |Et| and |Nt| are the number of disk pages occupied by the input graph's edge set and node set, resp., and sort(n) and scan(n) are the cost of sorting and scanning, resp., a file occupying n pages in external memory. Empirical analysis on a variety of massive real-world and synthetic graph datasets shows that our algorithms perform efficiently in practice, scaling gracefully as graphs grow in size.

[1] Guy E. Blelloch,et al. GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[2] Nikos Mamoulis,et al. Efficient processing of joins on set-valued attributes , 2003, SIGMOD '03.

[3] George H. L. Fletcher,et al. Efficient external-memory bisimulation on DAGs , 2012, SIGMOD Conference.

[4] Luca Aceto,et al. Advanced Topics in Bisimulation and Coinduction , 2012, Cambridge tracts in theoretical computer science.

[5] Jignesh M. Patel,et al. Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[6] Tom Heath,et al. Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[7] Roberto Grossi,et al. On sorting strings in external memory (extended abstract) , 1997, STOC '97.

[8] Jan Hidders,et al. Bisimulation Reduction of Big Graphs on MapReduce , 2013, BNCOD.

[9] Agostino Dovier,et al. An efficient algorithm for computing bisimulation equivalence , 2004, Theor. Comput. Sci..

[10] Jan Hidders,et al. Regularities and dynamics in bisimulation reductions of big graphs , 2013, GRADES.

[11] Peter Sanders,et al. STXXL: standard template library for XXL data sets , 2008, Softw. Pract. Exp..

[12] Dan Suciu,et al. Index Structures for Path Expressions , 1999, ICDT.

[13] Alok Aggarwal,et al. The input/output complexity of sorting and related problems , 1988, CACM.

[14] Georg Lausen,et al. SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[15] Carla Piazza,et al. From Bisimulation to Simulation: Coarsest Partition Problems , 2003, Journal of Automated Reasoning.

[16] Christian Bizer,et al. The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[17] Simona Orzan,et al. Distributed state space minimization , 2004, International Journal on Software Tools for Technology Transfer.

[18] Insup Lee,et al. Parallel Algorithms for Relational Coarsest Partition Problems , 1998, IEEE Trans. Parallel Distributed Syst..

[19] Hosung Park,et al. What is Twitter, a social network or a news media? , 2010, WWW '10.

[20] Andrew Lim,et al. D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[21] Georg Lausen,et al. Large-scale bisimulation of RDF graphs , 2013, SWIM '13.

[22] Robert E. Tarjan,et al. Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[23] Ehud Gudes,et al. Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[24] George H. L. Fletcher,et al. A methodology for coupling fragments of XPath with structural indexes for XML documents , 2007, Inf. Syst..

[25] Jan Hidders,et al. A Structural Approach to Indexing Triples , 2012, ESWC.

[26] Mariano P. Consens,et al. Linked Movie Data Base , 2009, LDOW.

[27] Wenfei Fan,et al. Graph pattern matching revised for social network analysis , 2012, ICDT '12.

[28] Hao He,et al. Incremental maintenance of XML structural indexes , 2004, SIGMOD '04.

[29] Tim Berners-Lee,et al. Linked data , 2020, Semantic Web for the Working Ontologist.

[30] Peter Buneman,et al. Edinburgh Research Explorer Path Queries on Compressed XML , 2022 .

[31] Giuseppe Ottaviano,et al. Fast Compressed Tries through Path Decompositions , 2011, ALENEX.

[32] Xin Wang,et al. Query preserving graph compression , 2012, SIGMOD Conference.

[33] Roberto Grossi,et al. The string B-tree: a new data structure for string search in external memory and its applications , 1999, JACM.

[34] J. Vitter,et al. On Sorting Strings in External Memory , 1997 .

[35] Rizal Setya Perdana. What is Twitter , 2013 .

[36] Hao He,et al. Multiresolution indexing of XML for frequent queries , 2004, Proceedings. 20th International Conference on Data Engineering.

[37] Georg Lausen,et al. SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.