An I/O-Efficient Buffer Batch Replacement Policy for Update-Intensive Graph Databases

With the proliferation of graph based applications, such as social network management and Web structure mining, update-intensive graph databases have become an important component of today's data management platforms. Several techniques have been recently proposed to exploit locality on both data organization and computational model in graph databases. However, little investigation has been conducted on buffer management of graph databases. To the best of our knowledge, current buffer managers of graph databases suffer performance loss caused by unnecessary random I/O access. To solve this problem, we develop a novel batch replacement policy for buffer management. This policy enables us to maximally exploit sequential I/O to improve the performance of graph database. To enable the policy, we devise a segment tree based buffer manager to efficiently maintains optimal replacement plan. Extensive experiments on real-world and synthetic datasets demonstrate the superiority of our method.

[1]  Mark de Berg,et al.  Computational Geometry: Algorithms and Applications, Second Edition , 2000 .

[2]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[3]  Panos Kalnis,et al.  User oriented trajectory search for trip recommendation , 2012, EDBT '12.

[4]  Jim Webber,et al.  Graph Databases: New Opportunities for Connected Data , 2013 .

[5]  Haixun Wang,et al.  Managing and mining large graphs: systems and implementations , 2012, SIGMOD Conference.

[6]  Julian Dolby,et al.  Building an efficient RDF store over a relational database , 2013, SIGMOD '13.

[7]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.

[8]  Haixun Wang,et al.  A Distributed Graph Engine for Web Scale RDF Data , 2013, Proc. VLDB Endow..

[9]  Michael A. Bender,et al.  Cache-Oblivious B-Trees , 2005, SIAM J. Comput..

[10]  Jian Pei,et al.  Within-Network Classification Using Radius-Constrained Neighborhood Patterns , 2014, CIKM.

[11]  Panos Kalnis,et al.  Personalized trajectory matching in spatial networks , 2014, The VLDB Journal.

[12]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.

[13]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[14]  Chang Zhou,et al.  MOCgraph: Scalable Distributed Graph Processing Using Message Online Computing , 2014, Proc. VLDB Endow..

[15]  Virendra J. Marathe,et al.  LLAMA: Efficient graph analytics using Large Multiversioned Arrays , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[16]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[17]  Josep-Lluís Larriba-Pey,et al.  Dex: high-performance exploration on large graphs for information retrieval , 2007, CIKM '07.

[18]  Willy Zwaenepoel,et al.  Chaos: scale-out graph processing from secondary storage , 2015, SOSP.

[19]  Ching-Yung Lin,et al.  Graph analytics and storage , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[20]  Kai Zheng,et al.  PNN query processing on compressed trajectories , 2011, GeoInformatica.

[21]  Wolfgang Lehner,et al.  The Graph Story of the SAP HANA Database , 2013, BTW.

[22]  Jinha Kim,et al.  TurboGraph: a fast parallel graph engine handling billion-scale graphs in a single PC , 2013, KDD.

[23]  Timothy G. Armstrong,et al.  LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[24]  Gerhard Weikum,et al.  An optimality proof of the LRU-K page replacement algorithm , 1999, JACM.

[25]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[26]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[27]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[28]  Ji-Rong Wen,et al.  Mining frequent neighborhood patterns in a large labeled graph , 2013, CIKM.

[29]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[30]  Wolfgang Effelsberg,et al.  Principles of database buffer management , 1984, TODS.

[31]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[32]  Qi Zhang,et al.  GraphTwist: Fast Iterative Graph Computation with Two-tier Optimizations , 2015, Proc. VLDB Endow..

[33]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, KDD 2012.