Tutorial: Adaptive Replication and Partitioning in Data Systems

To meet growing application demands, distributed data systems replicate and partition data across multiple machines. Replication increases the resource and request processing capabilities of a system by spreading copies of the data across multiple machines, while partitioning splits data across machines to achieve the same objectives. Replication and partitioning present different trade-offs in the form of replication maintenance and multi-machine coordination costs, which system administrators must carefully evaluate. Traditionally, administrators made replication and partitioning decisions based on their understanding of the application workload, which results in suboptimal performance if the system is misconfigured or if the workload changes. However, systems that adaptively employ replication and partitioning can adjust these decisions based on workload observations and predictions, which improves performance and reduces complexity for administrators. In this tutorial, we present an overview of techniques used by systems to adaptively partition and replicate data and services. We focus on the decision-making strategies employed by these systems, and how these decisions are executed in an online environment. Finally, we identify opportunities for research in the area.

[1]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.

[2]  Lin Ma,et al.  Query-based Workload Forecasting for Self-Driving Database Management Systems , 2018, SIGMOD Conference.

[3]  Michael Stonebraker,et al.  Intel "big data" science and technology center vision and execution plan , 2013, SGMD.

[4]  Gang Chen,et al.  Towards a Non-2PC Transaction Management in Distributed Database Systems , 2016, SIGMOD Conference.

[5]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[6]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[7]  Gustavo Alonso,et al.  Ganymed: Scalable Replication for Transactional Web Applications , 2004, Middleware.

[8]  George Candea,et al.  Middleware-based database replication: the gaps between theory and practice , 2007, SIGMOD Conference.

[9]  Eli Upfal,et al.  The Case for Predictive Database Systems: Opportunities and Challenges , 2011, CIDR.

[10]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[11]  Hui Ding,et al.  TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.

[12]  Samuel Madden,et al.  From Databases to Big Data , 2012, IEEE Internet Comput..

[13]  Silvia Bonomi,et al.  Elastic Symbiotic Scaling of Operators and Resources in Stream Processing Systems , 2018, IEEE Transactions on Parallel and Distributed Systems.

[14]  Divyakant Agrawal,et al.  Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases , 2015, SIGMOD Conference.

[15]  Alekh Jindal,et al.  The Uncracked Pieces in Database Cracking , 2013, Proc. VLDB Endow..

[16]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[17]  Yuqing Zhu,et al.  BestConfig: tapping the performance potential of systems via automatic configuration tuning , 2017, SoCC.

[18]  Wilson C. Hsieh,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 251 Spanner: Google's Globally-distributed Database , 2022 .

[19]  Michael Stonebraker,et al.  E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing , 2014, Proc. VLDB Endow..

[20]  Alekh Jindal,et al.  An experimental evaluation and analysis of database cracking , 2015, The VLDB Journal.

[21]  Katja Hose,et al.  WARP: Workload-aware replication and partitioning for RDF , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[22]  Alexandros Labrinidis,et al.  A holistic view of stream partitioning costs , 2017, Proc. VLDB Endow..

[23]  Roland H. C. Yap,et al.  Stochastic Database Cracking: Towards Robust Adaptive Indexing in Main-Memory Column-Stores , 2012, Proc. VLDB Endow..

[24]  Luís E. T. Rodrigues,et al.  AutoPlacer: Scalable Self-Tuning Data Placement in Distributed Key-Value Stores , 2015, TAAS.

[25]  Anja Feldmann,et al.  C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection , 2015, NSDI.

[26]  Geoffrey J. Gordon,et al.  Automatic Database Management System Tuning Through Large-scale Machine Learning , 2017, SIGMOD Conference.

[27]  Daniel J. Abadi,et al.  LEOPARD: Lightweight Edge-Oriented Partitioning and Replication for Dynamic Graphs , 2016, Proc. VLDB Endow..

[28]  Arif Merchant,et al.  Take me to your leader! Online Optimization of Distributed Storage Configurations , 2015, Proc. VLDB Endow..

[29]  Ali Ghodsi,et al.  Drizzle: Fast and Adaptable Stream Processing at Scale , 2017, SOSP.

[30]  Saurabh Bagchi,et al.  Rafiki: a middleware for parameter tuning of NoSQL datastores for dynamic metagenomics workloads , 2017, Middleware.

[31]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[32]  A. Prasad Sistla,et al.  Data replication for mobile computers , 1994, SIGMOD '94.

[33]  Michael Stonebraker,et al.  P-Store: An Elastic Database System with Predictive Provisioning , 2018, SIGMOD Conference.

[34]  Martin L. Kersten,et al.  Database Cracking , 2007, CIDR.

[35]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[36]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[37]  Khuzaima Daudjee,et al.  Apollo: Learning Query Correlations for Predictive Caching in Geo-Distributed Systems , 2018, EDBT.

[38]  Joseph M. Hellerstein,et al.  Flux: an adaptive partitioning operator for continuous query systems , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[39]  Lin Ma,et al.  Self-Driving Database Management Systems , 2017, CIDR.

[40]  Michael Stonebraker,et al.  Fault-tolerance in the borealis distributed stream processing system , 2008, ACM Trans. Database Syst..

[41]  Gustavo Alonso,et al.  Understanding replication in databases and distributed systems , 2000, Proceedings 20th IEEE International Conference on Distributed Computing Systems.

[42]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[43]  Kenneth Salem,et al.  Optimization of query streams using semantic prefetching , 2005, TODS.

[44]  Bettina Kemme,et al.  AdaptCache: Adaptive Data Partitioning and Migration for Distributed Object Caches , 2016, Middleware.

[45]  Khuzaima Daudjee,et al.  EC-Store: Bridging the Gap between Storage and Latency in Distributed Erasure Coded Systems , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[46]  Lei Chen,et al.  Hermes: Dynamic Partitioning for Distributed Social Network Graph Databases , 2015, EDBT.

[47]  Abdul Quamar,et al.  SWORD: scalable workload-aware data placement for transactional workloads , 2013, EDBT '13.

[48]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[49]  Harumi A. Kuno,et al.  Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores , 2011, Proc. VLDB Endow..

[50]  Stanley B. Zdonik,et al.  Fido: A Cache That Learns to Fetch , 1991, VLDB.

[51]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[52]  Sushil Jajodia,et al.  An adaptive data replication algorithm , 1997, TODS.

[53]  Divyakant Agrawal,et al.  Global-Scale Placement of Transactional Data Stores , 2018, EDBT.