Bursting the Cloud Data Bubble: Towards Transparent Storage Elasticity in IaaS Clouds

Storage elasticity on IaaS clouds is an important feature for data-intensive workloads: storage requirements can vary greatly during application runtime, making worst-case over-provisioning a poor choice that leads to unnecessarily tied-up storage and extra costs for the user. While the ability to adapt dynamically to storage requirements is thus attractive, how to implement it is not well understood. Current approaches simply rely on users to attach and detach virtual disks to the virtual machine (VM) instances and then manage them manually, thus greatly increasing application complexity while reducing cost efficiency. Unlike such approaches, this paper aims to provide a transparent solution that presents a unified storage space to the VM in the form of a regular POSIX file system that hides the details of attaching and detaching virtual disks by handling those actions transparently based on dynamic application requirements. The main difficulty in this context is to understand the intent of the application and regulate the available storage in order to avoid running out of space while minimizing the performance overhead of doing so. To this end, we propose a storage space prediction scheme that analyzes multiple system parameters and dynamically adapts monitoring based on the intensity of the I/O in order to get as close as possible to the real usage. We show the value of our proposal over static worst-case over-provisioning and simpler elastic schemes that rely on a reactive model to attach and detach virtual disks, using both synthetic benchmarks and real-life data-intensive applications. Our experiments demonstrate that we can reduce storage waste/cost by 30-40% with only 2-5% performance overhead.

[1]  Rui Oliveira,et al.  Automatic elasticity in OpenStack , 2012, SDMCMM '12.

[2]  Chentao Wu,et al.  GSR: A Global Stripe-Based Redistribution Approach to Accelerate RAID-5 Scaling , 2012, 2012 41st International Conference on Parallel Processing.

[3]  Paul Marshall,et al.  Elastic Site: Using Clouds to Elastically Extend Site Resources , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[4]  Marty Humphrey,et al.  Auto-scaling to minimize cost and meet application deadlines in cloud workflows , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  ともやん KVM (Kernel-based Virtual Machine) - 仮想化 , 2009 .

[6]  Salman Baset Open source cloud technologies , 2012, SoCC '12.

[7]  John Bresnahan,et al.  Infrastructure outsourcing in multi-cloud environment , 2012, FederatedClouds '12.

[8]  Bogdan Nicolae,et al.  On the Benefits of Transparent Compression for Cost-Effective Cloud Data Storage , 2011, Trans. Large Scale Data Knowl. Centered Syst..

[9]  Jeffrey S. Chase,et al.  Automated control for elastic storage , 2010, ICAC '10.

[10]  Michael D. Ernst,et al.  The HaLoop approach to large-scale iterative data analysis , 2012, The VLDB Journal.

[11]  Hans-Hermann Bock,et al.  Clustering Methods: A History of k-Means Algorithms , 2007 .

[12]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.

[13]  Yuanyuan Zhou,et al.  Mining block correlations to improve storage performance , 2005, TOS.

[14]  Rajkumar Buyya,et al.  Author's Personal Copy Future Generation Computer Systems a Coordinator for Scaling Elastic Applications across Multiple Clouds , 2022 .

[15]  Weimin Zheng,et al.  FastScale: Accelerate RAID Scaling by Minimizing Data Migration , 2011, FAST.

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  Haruo Yokota,et al.  Effects on performance and energy reduction by file relocation based on file-access correlations , 2012, EDBT-ICDT '12.

[18]  Ashiquee Rasool Mohammad,et al.  Going Back and Forth: Efficient Multi-deployment and Multi-snapshotting on Clouds , 2012 .

[19]  Carlos Maltzahn,et al.  I/O acceleration with pattern detection , 2013, HPDC.

[20]  Toni Cortes,et al.  Increasing the capacity of RAID5 by online gradual assimilation , 2004, SNAPI '04.

[21]  Josef Bacik,et al.  BTRFS: The Linux B-Tree Filesystem , 2013, TOS.

[22]  Carlos de Alfonso,et al.  EC3: Elastic Cloud Computing Cluster , 2013, J. Comput. Syst. Sci..

[23]  Gabriel Antoniu,et al.  Going back and forth: efficient multideployment and multisnapshotting on clouds , 2011, HPDC '11.

[24]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[25]  Daniel A. Reed,et al.  Markov model prediction of I/O requests for scientific applications , 2002, ICS '02.

[26]  Seyong Lee,et al.  PUMA: Purdue MapReduce Benchmarks Suite , 2012 .

[27]  Hong Jiang,et al.  FARMER: A novel approach to file access correlation mining and evaluation reference model , 2008, HPDC '08.

[28]  Qing Yang,et al.  Proceedings of the international workshop on Storage network architecture and parallel I/Os , 2004 .

[29]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[30]  Arif Merchant,et al.  Projecting disk usage based on historical trends in a cloud environment , 2012, ScienceCloud '12.

[31]  Gabriel Antoniu,et al.  BlobSeer: Next-generation data management for large scale infrastructures , 2011, J. Parallel Distributed Comput..