NashDB: An End-to-End Economic Method for Elastic Database Fragmentation, Replication, and Provisioning

Distributed data management systems often operate on "elastic'' clusters that can scale up or down on demand. These systems face numerous challenges, including data fragmentation, replication, and cluster sizing. Unfortunately, these challenges have traditionally been treated independently, leaving administrators with little insight on how the interplay of these decisions affects query performance. This paper introduces NashDB, an adaptive data distribution framework that relies on an economic model to automatically balance the supply and demand of data fragments, replicas, and cluster nodes. NashDB adapts its decisions to query priorities and shifting workloads, while avoiding underutilized cluster nodes and redundant replicas. This paper introduces and evaluates NashDB's model, as well as a suite of optimization techniques designed to efficiently identify data distribution schemes that match workload demands and transition the system to this new scheme with minimum data transfer overhead. Experimentally, we show that NashDB is often Pareto dominant compared to other solutions.

[1]  Schahram Dustdar,et al.  Cost-Efficient and Application SLA-Aware Client Side Request Scheduling in an Infrastructure-as-a-Service Cloud , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[2]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[3]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Nikhil R. Devanur,et al.  Cloud scheduling with setup cost , 2013, SPAA.

[5]  Kjetil Nørvåg,et al.  DYFRAM: dynamic fragmentation and replica management in distributed database systems , 2010, Distributed and Parallel Databases.

[6]  M. AdelsonVelskii,et al.  AN ALGORITHM FOR THE ORGANIZATION OF INFORMATION , 1963 .

[7]  Yun Chi,et al.  iCBS: Incremental Costbased Scheduling under Piecewise Linear SLAs , 2011, Proc. VLDB Endow..

[8]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[9]  Michael Stonebraker,et al.  Data replication in Mariposa , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[10]  Michael Stonebraker,et al.  Clay: Fine-Grained Adaptive Partitioning for General Database Schemas , 2016, Proc. VLDB Endow..

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Magdalena Balazinska,et al.  PerfEnforce Demonstration: Data Analytics with Performance Guarantees , 2016, SIGMOD Conference.

[13]  Hiroshi Konno,et al.  Best piecewise constant approximation of a function of single variable , 1988 .

[14]  Divyakant Agrawal,et al.  ElasTraS: An elastic, scalable, and self-managing transactional database for the cloud , 2013, TODS.

[15]  Eduardo C. Xavier,et al.  The class constrained bin packing problem with applications to video-on-demand , 2008, Theor. Comput. Sci..

[16]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[17]  N. Tomizawa,et al.  On some techniques useful for solution of transportation network problems , 1971, Networks.

[18]  Divyakant Agrawal,et al.  Squall: Fine-Grained Live Reconfiguration for Partitioned Main Memory Databases , 2015, SIGMOD Conference.

[19]  Mor Harchol-Balter,et al.  Priority mechanisms for OLTP and transactional Web applications , 2004, Proceedings. 20th International Conference on Data Engineering.

[20]  Jialin Li,et al.  Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.

[21]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[22]  Carlo Curino,et al.  Schism , 2010, Proc. VLDB Endow..

[23]  Michael Mitzenmacher,et al.  The Power of Two Choices in Randomized Load Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[24]  Ioannis Konstantinou,et al.  Elastic management of cloud applications using adaptive reinforcement learning , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[25]  Leah Epstein,et al.  Class constrained bin packing revisited , 2010, Theor. Comput. Sci..

[26]  H. Edelsbrunner A new approach to rectangle intersections , 2010 .

[27]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[28]  Calton Pu,et al.  Intelligent management of virtualized resources for database systems in cloud environment , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[29]  B. Welford Note on a Method for Calculating Corrected Sums of Squares and Products , 1962 .

[30]  Johann Gamper,et al.  A scalable dynamic programming scheme for the computation of optimal k-segments for ordered data , 2017, Inf. Syst..

[31]  Jennie Duggan,et al.  A generic auto-provisioning framework for cloud databases , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[32]  Tami Tamir,et al.  Polynominal time approximation schemes for class-constrained packing problem , 2000, APPROX.

[33]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[34]  Olga Papaemmanouil,et al.  WiSeDB: A Learning-based Workload Management Advisor for Cloud Databases , 2016, Proc. VLDB Endow..

[35]  Ashraf Aboulnaga,et al.  Accordion: Elastic Scalability for Database Systems Supporting Distributed Transactions , 2014, Proc. VLDB Endow..

[36]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[37]  Samir Khuller,et al.  SWORD: workload-aware data placement and replica selection for cloud data management systems , 2014, The VLDB Journal.

[38]  Antony I. T. Rowstron,et al.  Bridging the tenant-provider gap in cloud services , 2012, SoCC '12.

[39]  Michael Stonebraker,et al.  E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing , 2014, Proc. VLDB Endow..

[40]  Ramakrishna Varadarajan,et al.  The Vertica Analytic Database: C-Store 7 Years Later , 2012, Proc. VLDB Endow..