论文信息 - Distributed Query Processing on Compressed Graphs Using K2-Trees

Distributed Query Processing on Compressed Graphs Using K2-Trees

Compact representation of Web and social graphs can be made efficiently with the K 2-tree as it achieves compression ratios about 5 bits per link for web graphs and about 20 bits per link for social graphs. The K 2-tree also enables fast processing of relevant queries such as direct and reverse neighbours in the compressed graph. These two properties make the K 2-tree suitable for inclusion in Web search engines where it is necessary to maintain very large graphs and to process on-line queries on them. Typically these search engines are deployed on dedicated clusters of distributed memory processors wherein the data set is partitioned and replicated to enable low query response time and high query throughput. In this context a practical strategy is simply to distribute the data on the processors and build local data structures for efficient retrieval in each processor. However, the way the data set is distributed on the processors can have a significant impact in performance. In this paper, we evaluate a number of data distribution strategies which are suitable for the K 2-tree and identify the alternative with the best general performance. In our study we consider different data sets and focus on metrics such as overall compression ratio and parallel response time for retrieving direct and reverse neighbours.

[1] John R. Gilbert,et al. The Combinatorial BLAS: design, implementation, and applications , 2011, Int. J. High Perform. Comput. Appl..

[2] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.

[3] Gonzalo Navarro,et al. DACs: Bringing direct access to variable-length codes , 2013, Inf. Process. Manag..

[4] Jeremy G. Siek,et al. The Boost Graph Library - User Guide and Reference Manual , 2001, C++ in-depth series.

[5] Sebastiano Vigna,et al. The webgraph framework I: compression techniques , 2004, WWW '04.

[6] Joseph M. Hellerstein,et al. GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[7] Susana Ladra,et al. Algorithms and compressed data structures for information retrieval , 2012 .

[8] Gonzalo Navarro,et al. k2-Trees for Compact Web Graph Representation , 2009, SPIRE.

[9] Henri E. Bal,et al. HipG: parallel processing of large-scale graphs , 2011, OPSR.

[10] Joseph M. Hellerstein,et al. Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[11] Sebastiano Vigna,et al. UbiCrawler: a scalable fully distributed Web crawler , 2004, Softw. Pract. Exp..

[12] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[13] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.