Permuting Web and Social Graphs

Abstract Since the first investigations on web-graph compression, it has been clear that the ordering of the nodes of a web graph has a fundamental influence on the compression rate (usually expressed as the number of bits per link). The authors of the LINK database [Randall et al. 02], for instance, investigated three different approaches: an extrinsic ordering (URL ordering) and two intrinsic orderings based on the rows of the adjacency matrix (lexicographic and Gray code); they concluded that URL ordering has many advantages in spite of a small penalty in compression. In this paper we approach this issue in a more systematic way, testing some known orderings and proposing some new ones. Our experiments are made in the WebGraph framework [Boldi and Vigna 04], and show that the compression technique and the structure of the graph can produce significantly different results. In particular, we show that for a transposed web graph, URL ordering is significantly less effective, and that some new mixed orderings combining host information and Gray/lexicographic orderings outperform all previous methods: in some large transposed graphs they yield the quite incredible compression rate of 1 bit per link. We experiment with these simple ideas on some nonweb social networks and obtain results that are extremely promising and are very close to those recently achieved using shingle orderings and backlink compression schemes [Chierichetti et al. 09].

[1]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume IV: Fascicle 2: Generating All Tuples and Permutations , 2005 .

[2]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[3]  Jean-Loup Guillaume,et al.  Efficient and Simple Encodings for the Web Graph , 2002, WAIM.

[4]  Gonzalo Navarro,et al.  k2-Trees for Compact Web Graph Representation , 2009, SPIRE.

[5]  Roi Blanco,et al.  Document Identifier Reassignment Through Dimensionality Reduction , 2005, ECIR.

[6]  Alberto Apostolico,et al.  Graph Compression by BFS , 2009, Algorithms.

[7]  A. Moffat,et al.  Offline dictionary-based compression , 2000, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[8]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[9]  Sebastiano Vigna,et al.  Permuting Web Graphs , 2009, WAW.

[10]  Tien-Fu Chen,et al.  Inverted file compression through document identifier reassignment , 2003, Inf. Process. Manag..

[11]  György Turán,et al.  On the succinct representation of graphs , 1984, Discret. Appl. Math..

[12]  Sebastiano Vigna,et al.  Codes for the World Wide Web , 2005, Internet Math..

[13]  Torsten Suel,et al.  Compressing the graph structure of the Web , 2001, Proceedings DCC 2001. Data Compression Conference.

[14]  Raymie Stata,et al.  The Link Database: fast access to graphs of the Web , 2002, Proceedings DCC 2002. Data Compression Conference.

[15]  Kumar Chellapilla,et al.  Speeding up algorithms on compressed web graphs , 2009, WSDM '09.

[16]  Gonzalo Navarro,et al.  A Fast and Compact Web Graph Representation , 2007, SPIRE.

[17]  LeeDongwon,et al.  On six degrees of separation in DBLP-DB and more , 2005 .

[18]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[19]  Eli Upfal,et al.  The Web as a graph , 2000, PODS.

[20]  Dongwon Lee,et al.  On six degrees of separation in DBLP-DB and more , 2005, SGMD.

[21]  Guy E. Blelloch,et al.  Compact representations of separable graphs , 2003, SODA '03.

[22]  Peter Elias,et al.  Efficient Storage and Retrieval by Content and Address of Static Files , 1974, JACM.

[23]  Konstantin Avrachenkov,et al.  Proceedings of the 6th International Workshop on Algorithms and Models for the Web-Graph , 2009 .

[24]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[25]  Guy E. Blelloch,et al.  Index compression through document reordering , 2002, Proceedings DCC 2002. Data Compression Conference.

[26]  Fabrizio Silvestri,et al.  Sorting Out the Document Identifier Assignment Problem , 2007, ECIR.

[27]  Tsuyoshi Ito,et al.  Compact Encoding of the Web Graph Exploiting Various Power Laws: Statistical Reason Behind Link Database , 2003, WAIM.

[28]  Andrei Z. Broder,et al.  The Connectivity Server: Fast Access to Linkage Information on the Web , 1998, Comput. Networks.

[29]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[30]  Sriram Raghavan,et al.  Representing Web graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[31]  Takao Nishizeki,et al.  Efficient Compression of Web Graphs , 2008, COCOON.

[32]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[33]  Donald E. Knuth,et al.  The Art of Computer Programming, Volume 4, Fascicle 2: Generating All Tuples and Permutations (Art of Computer Programming) , 2005 .

[34]  Moni Naor Succinct representation of general unlabeled graphs , 1990, Discret. Appl. Math..