Towards Cost-Effective and Elastic Cloud Database Deployment via Memory Disaggregation

It is challenging for cloud-native relational databases to meet the ever-increasing needs of scaling compute and memory resources independently and elastically. The recent emergence of memory disaggregation architecture, relying on high-speed RDMA network, offers opportunities to build cost-effective and elastic cloud-native databases. There exist proposals to let unmodified applications run transparently on disaggregated systems. However, running relational database kernel atop such proposals experiences notable performance degradation and time-consuming failure recovery, offsetting the benefits of disaggregation. To address these challenges, in this paper, we propose a novel database architecture called LegoBase, which explores the co-design of database kernel and memory disaggregation. It pushes the memory management back to the database layer for bypassing the Linux I/O stack and re-using or designing (remote) memory access optimizations with an understanding of data access patterns. LegoBase further splits the conventional ARIES fault tolerance protocol to independently handle the local and remote memory failures for fast recovery of compute instances. We implemented LegoBase atop MySQL. We compare LegoBase against MySQL running on a standalone machine and the state-of-the-art disaggregation proposal Infiniswap. Our evaluation shows that even with a large fraction of data placed on the remote memory, LegoBase's system performance in terms of throughput (up to 9.41% drop) and P99 latency (up to 11.58% increase) is comparable to the monolithic MySQL setup, and significantly outperforms (1.99x-2.33x, respectively) the deployment of MySQL over Infiniswap. Meanwhile, LegoBase introduces an up to 3.87x and 5.48x speedup of the recovery and warm-up time, respectively, over the monolithic MySQL and MySQL over Infiniswap, when handling failures or planned re-configurations.

[1]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[2]  Kostas Katrinis,et al.  Rack-scale disaggregated cloud data centers: The dReDBox project vision , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Reza Sherkat,et al.  Native Store Extension for SAP HANA , 2019, Proc. VLDB Endow..

[4]  Marcos K. Aguilera,et al.  Remote regions: a simple abstraction for remote memory , 2018, USENIX ATC.

[5]  Siddhartha Sen,et al.  Disaggregation and the Application , 2019, HotCloud.

[6]  Yiying Zhang,et al.  LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation , 2018, OSDI.

[7]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[8]  Wei Cao,et al.  POLARDB Meets Computational Storage: Efficiently Support Analytical Workloads in Cloud-Native Relational Database , 2020, FAST.

[9]  Surajit Chaudhuri,et al.  Automatically Indexing Millions of Databases in Microsoft Azure SQL Database , 2019, SIGMOD Conference.

[10]  Scott Shenker,et al.  Network Requirements for Resource Disaggregation , 2016, OSDI.

[11]  Ali Anwar,et al.  Analyzing Alibaba’s Co-located Datacenter Workloads , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[12]  Feifei Li,et al.  Cloud native database systems at Alibaba: Opportunities and Challenges , 2019, Proc. VLDB Endow..

[13]  Jacob Nelson,et al.  Latency-Tolerant Software Distributed Shared Memory , 2015, USENIX Annual Technical Conference.

[14]  Miguel Castro,et al.  FaRM: Fast Remote Memory , 2014, NSDI.

[15]  Willy Zwaenepoel,et al.  Hailstorm: Disaggregated Compute and Storage for Distributed LSM-based Databases , 2020, ASPLOS.

[16]  James S. Plank,et al.  A tutorial on Reed–Solomon coding for fault‐tolerance in RAID‐like systems , 1997, Softw. Pract. Exp..

[17]  Boon Thau Loo,et al.  Understanding the effect of data center resource disaggregation on production DBMSs , 2020, Proc. VLDB Endow..

[18]  Qi Liu,et al.  TiDB , 2020, Proc. VLDB Endow..

[19]  F. Moore,et al.  Polynomial Codes Over Certain Finite Fields , 2017 .

[20]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[21]  Huan Liu,et al.  A Measurement Study of Server Utilization in Public Clouds , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[22]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[23]  Kang G. Shin,et al.  Efficient Memory Disaggregation with Infiniswap , 2017, NSDI.

[24]  David J. DeWitt,et al.  Microsoft azure SQL database telemetry , 2015, SoCC.

[25]  Anurag Gupta,et al.  Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases , 2017, SIGMOD Conference.

[26]  I-Hsin Chung,et al.  Towards a Composable Computer System , 2018, HPC Asia.

[27]  Wei Cao,et al.  PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database , 2018, Proc. VLDB Endow..

[28]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[29]  Dejan S. Milojicic,et al.  Beyond Processor-centric Operating Systems , 2015, HotOS.

[30]  Mosharaf Chowdhury,et al.  Effectively Prefetching Remote Memory with Leap , 2019, USENIX ATC.

[31]  Kenneth C. Knowlton,et al.  A fast storage allocator , 1965, CACM.

[32]  Rodrigo N. Calheiros,et al.  Auto-scaling Web Applications in Clouds: A Taxonomy and Survey , 2016 .

[33]  Gang Chen,et al.  Efficient Distributed Memory Management with RDMA and Caching , 2018, Proc. VLDB Endow..

[34]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[35]  Krste Asanovic,et al.  FireBox: A Hardware Building Block for 2020 Warehouse-Scale Computers , 2014 .

[36]  Sneha Kumar Kasera,et al.  Auto-Scaling Cloud-Based Memory-Intensive Applications , 2020, 2020 IEEE 13th International Conference on Cloud Computing (CLOUD).