Delta-K 2-tree for Compact Representation of Web Graphs

The World Wide Web structure can be represented by a directed graph named as the web graph. The web graphs have been used in a wide range of applications. However, the increasingly large-scale web graphs pose great challenges to the traditional memory-resident graph algorithms. In the literature, K 2-tree can efficiently compress the web graphs while supporting fast querying in the compressed data. Inspired by K 2-tree, we propose the Delta-K 2-tree compression approach, which exploits the characteristics of similarity between neighbor nodes in the web graphs. In addition, we design a node reordering algorithm to further improve the compression ratio. We compare our approach with the state-of-the-art algorithms, including K 2-tree, WebGraph, and AD. Experimental results of web graph compression on four datasets show that our Delta-K 2-tree approach outperforms the other three in compression ratio (1.66-2.55 bits per link), and meanwhile supports fast forward and reverse querying in graphs.

[1]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[2]  Sebastiano Vigna,et al.  The Webgraph framework II: codes for the World-Wide Web , 2004, Data Compression Conference, 2004. Proceedings. DCC 2004.

[3]  Hector Garcia-Molina,et al.  Performance of inverted indices in shared-nothing distributed text document information retrieval systems , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[4]  Alberto Apostolico,et al.  Graph Compression by BFS , 2009, Algorithms.

[5]  R. González,et al.  PRACTICAL IMPLEMENTATION OF RANK AND SELECT QUERIES , 2005 .

[6]  Takao Nishizeki,et al.  Efficient Compression of Web Graphs , 2008, COCOON.

[7]  Sebastiano Vigna,et al.  Permuting Web Graphs , 2009, WAW.

[8]  Valentine Kabanets,et al.  Correlation Bounds and #SAT Algorithms for Small Linear-Size Circuits , 2015, COCOON.

[9]  Sebastiano Vigna,et al.  A large time-aware web graph , 2008, SIGF.

[10]  Ge Yu,et al.  Large Scale Graph Data Processing on Cloud Computing Environments: Large Scale Graph Data Processing on Cloud Computing Environments , 2011 .

[11]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[12]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[13]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[14]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[15]  N. Ziviani,et al.  Distributed query processing using partitioned inverted files , 2001, Proceedings Eighth Symposium on String Processing and Information Retrieval.

[16]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[17]  Jeffrey Scott Vitter,et al.  Algorithms and Data Structures for External Memory , 2008, Found. Trends Theor. Comput. Sci..

[18]  Gonzalo Navarro,et al.  k2-Trees for Compact Web Graph Representation , 2009, SPIRE.