An Efficient Snapshot Strategy for Dynamic Graph Storage Systems to Support Historical Queries

Large-scale dynamic graphs typically involve big data. Recently a dynamic graph storage system is required to be capable of recreating any historical state to support historical queries. A typical storage solution supporting historical queries is called ‘snapshot plus log’. A snapshot records the whole data at a certain moment, while the log file is responsible for saving all the update operations. The historical state is then recreated from the nearest snapshot by redoing or undoing the related update operations saved in the log file. The challenge lies in how to minimize both the number of snapshots and that of redone and undone operations performed in historical state recreation. The traditional system stores snapshots at regular intervals. However, historical states do not share the same frequency of being requested. Therefore, the traditional strategy is very inefficient. This paper proposes a new strategy that determines the timestamps of the snapshots based on the distribution of the historical queries. First, the historical queries are clustered into a given number of groups according to the timestamps of the requested historical states, and the cluster centroids are calculated. Second, the snapshots are created according to the timestamps of the cluster centroids. Since the cluster centroids may change as time goes by, the above process is executed periodically. Experimental results show that with the same storage costs, the snapshot strategy proposed in this paper greatly improves the performance of recreating historical states, leading to at least 70.7% computation reduction in terms of the number of both redone and undone operations. Besides, with the same recreation performance guarantee, it brings nearly 78.9% storage reduction on average in terms of the number of snapshots.

[1]  Alan G. Labouseur,et al.  The G* graph database: efficiently managing large distributed dynamic graphs , 2015, Distributed and Parallel Databases.

[2]  Sridhar Radhakrishnan,et al.  Algorithms on Compressed Time-Evolving Graphs , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[3]  João Gama,et al.  Evolving Centralities in Temporal Graphs: A Twitter Network Analysis , 2016, 2016 17th IEEE International Conference on Mobile Data Management (MDM).

[4]  Olaf Wolkenhauer,et al.  Evolution of Centrality Measurements for the Detection of Essential Proteins in Biological Networks , 2016, Front. Physiol..

[5]  Jian Pei,et al.  Mining most frequently changing component in evolving graphs , 2014, World Wide Web.

[6]  Marco Fiore,et al.  Temporal connectivity of vehicular networks: The power of store-carry-and-forward , 2015, 2015 IEEE Vehicular Networking Conference (VNC).

[7]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[8]  Anastasios Gounaris,et al.  Hinode: implementing a vertex-centric modelling approach to maintaining historical graph data , 2019, Computing.

[9]  Wenguang Chen,et al.  Chronos: a graph engine for temporal graph analysis , 2014, EuroSys '14.

[10]  Adnan Yazici,et al.  Big Data Model Simulation on a Graph Database for Surveillance in Wireless Multimedia Sensor Networks , 2017, Big Data Res..

[11]  Lijun Chen,et al.  A snapshot system based on cloud storage Log-Structured Block System , 2014, ICNSC.

[12]  Evaggelia Pitoura,et al.  Durable graph pattern queries on historical graphs , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[13]  Yin Yang,et al.  Using Zipf Distribution to Predict Popularity Data for Storage Systems , 2014, CIT 2014.

[14]  Udayan Khurana,et al.  Efficient snapshot retrieval over historical graph data , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[15]  Kostas Stefanidis,et al.  Social Search Queries in Time , 2013 .

[16]  Zhiwu Li,et al.  Containment of rumor spread in complex social networks , 2020, Inf. Sci..

[17]  Sabeur Aridhi,et al.  BLADYG: A Graph Processing Framework for Large Dynamic Graphs , 2017, Big Data Res..

[18]  Yuchen Zhang,et al.  Dynamic Detection of Academic Team Communities Based on Temporal Coauthor Network , 2017, 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC).

[19]  Monica Wachowicz,et al.  STVG: an evolutionary graph framework for analyzing fast-evolving networks , 2019, Journal of Big Data.

[20]  Ryan A. Rossi,et al.  Modeling dynamic behavior in large evolving graphs , 2013, WSDM.

[21]  Reynold Cheng,et al.  On querying historical evolving graph sequences , 2011, Proc. VLDB Endow..

[22]  Evaggelia Pitoura,et al.  On Graph Deltas for Historical Queries , 2013, ArXiv.

[23]  Safaa Amin,et al.  Modeling Sequence of Snapshots in Dynamic Graph , 2016, INFOS '16.

[24]  Evaggelia Pitoura,et al.  Finding lasting dense subgraphs , 2016, Data Mining and Knowledge Discovery.

[25]  Aditya G. Parameswaran,et al.  Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff , 2015, Proc. VLDB Endow..

[26]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[27]  MengChu Zhou,et al.  Modeling and Planning for Dual-Objective Selective Disassembly Using and/or Graph and Discrete Artificial Bee Colony , 2019, IEEE Transactions on Industrial Informatics.

[28]  Safaa Amin,et al.  Comprehensive Survey on Dynamic Graph Models , 2016 .

[29]  Safaa Amin,et al.  Efficient distributed dynamic graph system , 2015, 2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS).

[30]  Wenguang Chen,et al.  Auxo: a temporal graph management system , 2019, Big Data Min. Anal..