HetStore: A Platform for IO Workload Assignment in a Heterogeneous Storage Environment

The problem of providing optimal assignment for backend storage is a central problem in the design of cloud systems. It has taken a further central role as a result of growing heterogeneity from emerging Software Defined Storage systems. In this paper, we propose a solution to optimal IO Workload assignment using statistical modelling to estimate measures of performance such as Throughput, IOPS, et al. The proposed system uses support vector regression to estimate the performance of individual IO Workloads on each available SDS system for optimal assignment. As a proof of concept, we demonstrate our solution in a heterogeneous environment comprising of HDFS, GlusterFS, and Ceph. We first show the accuracy of estimation of throughput and IOPS with values of coefficient of determination over 0.65 in all cases. We further show the analysis of using this regression model to classify workloads to respective SDS backend that will maximize throughput.

[1]  Jongmoo Choi,et al.  IO Workload Characterization Revisited: A Data-Mining Approach , 2014, IEEE Transactions on Computers.

[2]  N. Nagelkerke,et al.  A note on a general definition of the coefficient of determination , 1991 .

[3]  Daniel van der Ster,et al.  Building an organic block storage service at CERN with Ceph , 2014 .

[4]  Chia-Hua Ho,et al.  Large-scale linear support vector regression , 2012, J. Mach. Learn. Res..

[5]  Wubin Li,et al.  A Workload Aware Storage Platform for Large Scale Computing Environments: Challenges and Proposed Directions , 2016, ScienceCloud@HPDC.

[6]  Yang Song,et al.  Efficient and agile storage management in software defined environments , 2014, IBM J. Res. Dev..

[7]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[8]  日経BP社,et al.  Amazon Web Services完全ソリューションガイド , 2016 .

[9]  Arif Merchant,et al.  Janus: Optimal Flash Provisioning for Cloud Storage Workloads , 2013, USENIX Annual Technical Conference.

[10]  Qi Zhang,et al.  Characterization of storage workload traces from production Windows Servers , 2008, 2008 IEEE International Symposium on Workload Characterization.

[11]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[12]  Zheng Shao,et al.  Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.