Efficient snapshot retrieval over historical graph data

We present a distributed graph database system to manage historical data for large evolving information networks, with the goal to enable temporal and evolutionary queries and analysis. The cornerstone of our system is a novel, user-extensible, highly tunable, and distributed hierarchical index structure called DeltaGraph, that enables compact recording of the historical network information, and that supports efficient retrieval of historical graph snapshots for single-site or parallel processing. Our system exposes a general programmatic API to process and analyze the retrieved snapshots. Along with the original graph data, DeltaGraph can also maintain and index auxiliary information; this functionality can be used to extend the structure to efficiently execute queries like subgraph pattern matching over historical data. We develop analytical models for both the storage space needed and the snapshot retrieval times to aid in choosing the right construction parameters for a specific scenario. We also present an in-memory graph data structure called GraphPool that can maintain hundreds of historical graph instances in main memory in a non-redundant manner. We present a comprehensive experimental evaluation that illustrates the effectiveness of our proposed techniques at managing historical graph information.

[1]  Harry K. T. Wong,et al.  The role of time in information processing: a survey , 1982, SGAR.

[2]  Richard T. Snodgrass,et al.  A taxonomy of time databases , 1985, SIGMOD Conference.

[3]  Sushil Jajodia,et al.  Temporal Databases: Theory, Design, and Implementation , 1993 .

[4]  Richard T. Snodgrass,et al.  The TSQL2 Temporal Query Language , 1995 .

[5]  Gultekin Özsoyoglu,et al.  Temporal and Real-Time Databases: A Survey , 1995, IEEE Trans. Knowl. Data Eng..

[6]  Vassilis J. Tsotras,et al.  The Snapshot Index: An I/O-optimal access method for timeslice queries , 1995, Inf. Syst..

[7]  Shahram Ghandeharizadeh,et al.  Heraclitus: elevating deltas to be first-class citizens in a database programming language , 1996, TODS.

[8]  Jeffrey Scott Vitter,et al.  Optimal dynamic interval management in external memory , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[9]  Vassilis J. Tsotras,et al.  Comparison of access methods for time-evolving data , 1999, CSUR.

[10]  Amélie Marian,et al.  Change-Centric Management of Versions in an XML Warehouse , 2001, VLDB.

[11]  Keishi Tajima,et al.  Archiving scientific data , 2004, TODS.

[12]  Dennis Shasha,et al.  Algorithmics and applications of tree and graph searching , 2002, PODS.

[13]  C. J. Date,et al.  Temporal data and the relational model , 2002 .

[14]  Raymond K. Wong,et al.  A Fast Index for XML Document Version Management , 2003, APWeb.

[15]  Ralf Hartmut Güting,et al.  External segment trees , 1994, Algorithmica.

[16]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.

[17]  Tanya Y. Berger-Wolf,et al.  A framework for analysis of dynamic social networks , 2006, KDD '06.

[18]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[19]  Tanya Y. Berger-Wolf,et al.  A framework for community identification in dynamic social networks , 2007, KDD '07.

[20]  Srinivasan Parthasarathy,et al.  An event-based framework for characterizing the evolutionary behavior of interaction graphs , 2007, KDD '07.

[21]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[22]  Kathleen M. Carley,et al.  Social Network Change Detection , 2008 .

[23]  Huan Liu,et al.  Community evolution in dynamic multi-mode networks , 2008, KDD.

[24]  David B. Lomet,et al.  Transaction time indexing with version compression , 2008, Proc. VLDB Endow..

[25]  V. S. Subrahmanian,et al.  Scaling RDF with Time , 2008, WWW.

[26]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[27]  Abraham Bernstein,et al.  Applied Temporal RDF: Efficient Temporal Querying of RDF Data with SPARQL , 2009, ESWC.

[28]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing - "ABSTRACT" , 2009, PODC '09.

[29]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[30]  Derek Greene,et al.  Tracking the Evolution of Communities in Dynamic Social Networks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[31]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[32]  Charu C. Aggarwal,et al.  Graph Data Management and Mining: A Survey of Algorithms and Applications , 2010, Managing and Mining Graph Data.

[33]  Reynold Cheng,et al.  On querying historical evolving graph sequences , 2011, Proc. VLDB Endow..

[34]  Carl Kingsford,et al.  Network Archaeology: Uncovering Ancient Networks from Present-Day Interactions , 2010, PLoS Comput. Biol..

[35]  Sudipto Guha,et al.  Graph sketches: sparsification, spanners, and subgraphs , 2012, PODS.

[36]  Boris Motik,et al.  Representing and querying validity time in RDF and OWL: A logic-based approach , 2010, J. Web Semant..

[37]  Michael Stonebraker,et al.  Efficient Versioning for Scientific Array Databases , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[38]  Haixun Wang,et al.  The Trinity Graph Engine , 2012 .

[39]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[40]  Ben Shneiderman,et al.  A Task Taxonomy for Network Evolution Analysis , 2014, IEEE Transactions on Visualization and Computer Graphics.