Query preserving graph compression

It is common to find graphs with millions of nodes and billions of edges in, e.g., social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose query preserving graph compression, to compress graphs relative to a class Λ of queries of users' choice. We compute a small Gr from a graph G such that (a) for any query Q Ε Λ Q, Q(G) = Q'(Gr), where Q' Ε Λ can be efficiently computed from Q; and (b) any algorithm for computing Q(G) can be directly applied to evaluating Q' on Gr as is. That is, while we cannot lower the complexity of evaluating graph queries, we reduce data graphs while preserving the answers to all the queries in Λ. To verify the effectiveness of this approach, (1) we develop compression strategies for two classes of queries: reachability and graph pattern queries via (bounded) simulation. We show that graphs can be efficiently compressed via a reachability equivalence relation and graph bisimulation, respectively, while reserving query answers. (2) We provide techniques for aintaining compressed graph Gr in response to changes ΔG to the original graph G. We show that the incremental maintenance problems are unbounded for the two lasses of queries, i.e., their costs are not a function of the size of ΔG and changes in Gr. Nevertheless, we develop incremental algorithms that depend only on ΔG and Gr, independent of G, i.e., we do not have to decompress Gr to propagate the changes. (3) Using real-life data, we experimentally verify that our compression techniques could reduce graphs in average by 95% for reachability and 57% for graph pattern matching, and that our incremental maintenance algorithms are efficient.

[1]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[2]  Alfred V. Aho,et al.  The Transitive Reduction of a Directed Graph , 1972, SIAM J. Comput..

[3]  Jeffrey Xu Yu,et al.  Graph Reachability Queries: A Survey , 2010, Managing and Mining Graph Data.

[4]  Thomas W. Reps,et al.  On the Computational Complexity of Dynamic Graph Problems , 1996, Theor. Comput. Sci..

[5]  Dan Suciu,et al.  Index Structures for Path Expressions , 1999, ICDT.

[6]  Jianliang Xu,et al.  Optimizing Incremental Maintenance of Minimal Bisimulation of Cyclic Graphs , 2011, DASFAA.

[7]  Robert E. Tarjan,et al.  Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[8]  Ehud Gudes,et al.  Exploiting local similarity for indexing paths in graph-structured data , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Peter Buneman,et al.  Edinburgh Research Explorer Path Queries on Compressed XML , 2022 .

[10]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.

[11]  Oege de Moor,et al.  A memory efficient reachability data structure through bit vector compression , 2011, SIGMOD '11.

[12]  Mohammed J. Zaki,et al.  GRAIL , 2010, Proc. VLDB Endow..

[13]  Philip S. Yu,et al.  Compact reachability labeling for graph-structured data , 2005, CIKM '05.

[14]  Yang Xiang,et al.  3-HOP: a high-compression indexing scheme for reachability query , 2009, SIGMOD Conference.

[15]  Thomas A. Henzinger,et al.  Computing simulations on finite and infinite graphs , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[16]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[17]  Edith Cohen,et al.  Reachability and distance queries via 2-hop labels , 2002, SODA '02.

[18]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[19]  Gerald L. Thompson,et al.  An Algorithm for Finding a Minimum Equivalent Graph of a Digraph , 1969, J. ACM.

[20]  Diptikalyan Saha An Incremental Bisimulation Algorithm , 2007, FSTTCS.

[21]  Rajeev Motwani,et al.  Clique partitions, graph compression and speeding-up algorithms , 1991, STOC '91.

[22]  Agostino Dovier,et al.  A Fast Bisimulation Algorithm , 2001, CAV.

[23]  Yang Xiang,et al.  Efficiently answering reachability queries on very large directed graphs , 2008, SIGMOD Conference.

[24]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[25]  Edward A. Fox,et al.  Recommender Systems Research: A Connection-Centric Survey , 2004, Journal of Intelligent Information Systems.

[26]  Sriram Raghavan,et al.  Representing Web graphs , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[27]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[28]  Andrew Lim,et al.  D(k)-index: an adaptive structural summary for graph-structured data , 2003, SIGMOD '03.

[29]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[30]  Raymie Stata,et al.  The Link Database: fast access to graphs of the Web , 2002, Proceedings DCC 2002. Data Compression Conference.

[31]  Jian Pei,et al.  Neighbor query friendly compression of social networks , 2010, KDD.

[32]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[33]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[34]  Rajeev Motwani,et al.  Clique Partitions, Graph Compression and Speeding-Up Algorithms , 1995, J. Comput. Syst. Sci..

[35]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.