Towards a Cost-Aware Data Migration Approach for Key-Value Stores

Live data migration is an important technique for key-value stores. However, due to the stateful feature, new virtualization technology, stringent low latency requirements and unexpected workload changes, key-value stores deployed in cloud environment have to face new challenges for data migration: effects of VM interference, and the need to trade off between the two ingredients of migration cost, say migration time and performance impact. To address these challenges, we focus on the data migration problem in a load rebalancing scenario and build a new framework that aims to rebalance load while minimizing migration costs. We build two interference-aware prediction models to predict the migration time and performance impact for each action using statistical machine learning and then create a cost model to strike a right balance between the two ingredients of cost. A cost-aware migration algorithm is designed to utilize the cost model and balance rate to guide the choice of possible migration actions. We demonstrate the effectiveness of the data migration approach as well as the cost model and two prediction models using YCSB.

[1]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[2]  Michael I. Jordan,et al.  The SCADS Director: Scaling a Distributed Storage System Under Stringent Performance Requirements , 2011, FAST.

[3]  Jiri Schindler,et al.  A load balancing framework for clustered storage systems , 2008, HiPC'08.

[4]  Amin Vahdat,et al.  Enforcing Performance Isolation Across Virtual Machines in Xen , 2006, Middleware.

[5]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[6]  Wolfgang Härdle,et al.  Applied Multivariate Statistical Analysis: third edition , 2006 .

[7]  Chenyang Lu,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Aqueduct: Online Data Migration with Performance Guarantees , 2022 .

[8]  Calton Pu,et al.  Understanding Performance Interference of I/O Workload in Virtualized Cloud Environments , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[9]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[10]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[11]  Jeffrey S. Chase,et al.  Automated control for elastic storage , 2010, ICAC '10.

[12]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[13]  Xing Pu,et al.  Performance Analysis of Network I/O Workloads in Virtualized Data Centers , 2013, IEEE Transactions on Services Computing.

[14]  Alexander Russell,et al.  Data Migration in Heterogeneous Storage Systems , 2011, 2011 31st International Conference on Distributed Computing Systems.

[15]  David Sinreich,et al.  An architectural blueprint for autonomic computing , 2006 .