On compressing massive streaming graphs with Quadtrees

Social networks are constantly changing as new members join, existing members leave, and `followers' or `friends' are formed and disappear. The model that captures this constantly changing graph is the streaming graph model. Given a massive graph data stream wherein the number of nodes is in the order of millions and the number of edges is the tens of millions, we propose a simple algorithm to compress this graph without having read in the entire graph into the main memory. Our algorithm uses the quadtree data structure that is implicitly constructed to produce the compressed graph output. As a result of this implicit construction, our algorithm allows for node and edge additions/deletions that directly modifies the output compressed graph. We further develop algorithms to solve edge queries (is there any between two nodes?) and node queries (for a given node, list all its neighbors) that directly operates on the compressed graph. We have performed extensive empirical evaluations of our algorithms using publicly available, large social networks such as LiveJournal, Pokec, Twitter, and others. Our empirical evaluation is based on several parameters including time to compress, memory required by the compression algorithm, size of compressed graph, and time and memory size required to execute queries. We have also presented extensions to the compression algorithm that we have developed.

[1]  Christos Faloutsos,et al.  SlashBurn: Graph Compression and Mining beyond Caveman Communities , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Raymie Stata,et al.  The Link Database: fast access to graphs of the Web , 2002, Proceedings DCC 2002. Data Compression Conference.

[3]  Jian Pei,et al.  Neighbor query friendly compression of social networks , 2010, KDD.

[4]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[5]  Jon Louis Bentley,et al.  Quad trees a data structure for retrieval on composite keys , 1974, Acta Informatica.

[6]  Hanan Samet,et al.  Using Quadtrees to Represent Spatial Data , 1985 .

[7]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[8]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[9]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.