Transparent Throughput Elasticity for Modern Cloud Storage

Storage elasticity on the cloud is a crucial feature in the age of data-intensive computing, especially when considering fluctuations of I/O throughput. In this chapter, the authors explore how to transparently boost the I/O bandwidth during peak utilization to deliver high performance without over-provisioning storage resources. The proposal relies on the idea of leveraging short-lived virtual disks of better performance characteristics (and more expensive) to act during peaks as a caching layer for the persistent virtual disks where the application data is stored during runtime. They show how this idea can be achieved efficiently at the blockdevice level, using a caching mechanism that leverages iterative behavior and learns from past experience. Second, they introduce a corresponding performance and cost prediction methodology. They demonstrate the benefits of our proposal both for micro-benchmarks and for two real-life applications using large-scale experiments. They conclude with a discussion on how these techniques can be generalized for increasingly complex landscape of modern cloud storage. Transparent Throughput Elasticity for Modern Cloud Storage: An Adaptive Block-Level Caching Proposal

[1]  Bogdan Nicolae,et al.  Towards Transparent Throughput Elasticity for IaaS Cloud Storage: Exploring the Benefits of Adaptive Block-Level Caching , 2015, Int. J. Distributed Syst. Technol..

[2]  Kostas Katrinis,et al.  Leveraging Adaptive I/O to Optimize Collective Data Shuffling Patterns for Big Data Analytics , 2017, IEEE Transactions on Parallel and Distributed Systems.

[3]  Richard Branch,et al.  Cloud Computing and Big Data: A Review of Current Service Models and Hardware Perspectives , 2014 .

[4]  Yannis Manolopoulos,et al.  A Bi-objective Cost Model for Database Queries in a Multi-cloud Environment , 2014, MEDES.

[5]  Siyuan Ma,et al.  S-CAVE: Effective SSD caching to improve virtual machine storage performance , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[6]  Wenguang Chen,et al.  ACIC: Automatic cloud I/O configurator for HPC applications , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[7]  Cheng Li,et al.  Nitro: A Capacity-Optimized SSD Cache for Primary Storage , 2014, USENIX Annual Technical Conference.

[8]  Jeffrey S. Chase,et al.  Automated control for elastic storage , 2010, ICAC '10.

[9]  Bogdan Nicolae,et al.  Bursting the Cloud Data Bubble: Towards Transparent Storage Elasticity in IaaS Clouds , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[10]  Gang Chen,et al.  LogBase: A Scalable Log-structured Database System in the Cloud , 2012, Proc. VLDB Endow..

[11]  Gong Zhang,et al.  Automated lookahead data migration in SSD-enabled multi-tiered storage systems , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[12]  Bogdan Nicolae,et al.  Discovering and Leveraging Content Similarity to Optimize Collective on-Demand Data Access to IaaS Cloud Storage , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[13]  Gagan Agrawal,et al.  Elastic Cloud Caches for Accelerating Service-Oriented Computations , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Eric A. Brewer,et al.  Borg, Omega, and Kubernetes , 2016, ACM Queue.

[15]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[16]  Xubin He,et al.  An adaptive write buffer management scheme for flash-based SSDs , 2012, TOS.

[17]  Heeseung Jo,et al.  SSD-HDD-Hybrid Virtual Disk in Consolidated Environments , 2009, Euro-Par Workshops.

[18]  Johan Tordsson,et al.  Virtualization Techniques Compared: Performance, Resource, and Power Usage Overheads in Clouds , 2018, ICPE.

[19]  Bogdan Nicolae,et al.  Performance Model of MapReduce Iterative Applications for Hybrid Cloud Bursting , 2018, IEEE Transactions on Parallel and Distributed Systems.

[20]  Herodotos Herodotou,et al.  OctopusFS: A Distributed File System with Tiered Storage Management , 2017, SIGMOD Conference.

[21]  Gabriel Antoniu,et al.  Going back and forth: efficient multideployment and multisnapshotting on clouds , 2011, HPDC '11.

[22]  Trevor N. Mudge,et al.  FlashCache: a NAND flash memory file cache for low power web servers , 2006, CASES '06.

[23]  Michael M. Swift,et al.  FlashTier: a lightweight, consistent and durable storage cache , 2012, EuroSys '12.

[24]  Devarshi Ghoshal,et al.  MaDaTS: Managing Data on Tiered Storage for Scientific Workflows , 2017, HPDC.

[25]  Chris Douglas,et al.  Walnut: a unified cloud object store , 2012, SIGMOD Conference.

[26]  Franck Cappello,et al.  BlobCR: Virtual disk based checkpoint-restart for HPC applications on IaaS clouds , 2013, J. Parallel Distributed Comput..

[27]  Steve Plimpton,et al.  Fast parallel algorithms for short-range molecular dynamics , 1993 .

[28]  MudgeTrevor,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009 .

[29]  A. L. Narasimha Reddy,et al.  NVMFS: A hybrid file system for improving random write in nand-flash SSD , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[30]  Feng Chen,et al.  GDS-LC , 2017, ACM Trans. Storage.

[31]  Torsten Hoefler,et al.  Characterizing the Influence of System Noise on Large-Scale Applications by Simulation , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[32]  George Porter,et al.  Is memory disaggregation feasible? A case study with Spark SQL , 2016, 2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[33]  Murali S. Kodialam,et al.  Frugal storage for cloud file systems , 2012, EuroSys '12.

[34]  Kostas Katrinis,et al.  Enabling Big Data Analytics in the Hybrid Cloud Using Iterative MapReduce , 2015, 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC).

[35]  Renato J. O. Figueiredo,et al.  vPFS: Bandwidth virtualization of parallel storage systems , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[36]  Pietro Michiardi,et al.  Elastic Provisioning of Cloud Caches: A Cost-Aware TTL Approach , 2018, IEEE/ACM Transactions on Networking.

[37]  George H. Bryan,et al.  The Maximum Intensity of Tropical Cyclones in Axisymmetric Numerical Model Simulations , 2009 .

[38]  Dutch T. Meyer,et al.  Parallax: virtual disks for virtual machines , 2008, Eurosys '08.

[39]  Geoffrey H. Kuenning,et al.  The Conquest file system: Better performance through a disk/persistent-RAM hybrid design , 2006, TOS.

[40]  Remzi H. Arpaci-Dusseau,et al.  Storage-Aware Caching: Revisiting Caching for Heterogeneous Storage Systems , 2002, FAST.

[41]  Yang Wang,et al.  SOPA: Selecting the optimal caching policy adaptively , 2010, TOS.

[42]  Tei-Wei Kuo,et al.  A driver-layer caching policy for removable storage devices , 2011, TOS.

[43]  Peter J. Varman,et al.  Balancing fairness and efficiency in tiered storage systems with bottleneck-aware allocation , 2014, FAST.

[44]  Radu Calinescu,et al.  log2cloud: log-based prediction of cost-performance trade-offs for cloud deployments , 2013, SAC '13.

[45]  Bogdan Nicolae,et al.  Leveraging Collaborative Content Exchange for On-Demand VM Multi-deployments in IaaS Clouds , 2013, Euro-Par.

[46]  Jayant K. Singh,et al.  Melting transition of confined Lennard-Jones solids in slit pores , 2013, Theoretical Chemistry Accounts.

[47]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[48]  Bogdan Nicolae,et al.  Transparent Throughput Elasticity for IaaS Cloud Storage Using Guest-Side Block-Level Caching , 2014, 2014 IEEE/ACM 7th International Conference on Utility and Cloud Computing.