Workload Shifting: Contention-Insular Disk Arrays for Big Data Systems

It is well known that in-place update index, unordered log structured index and ordered log structured index are three typical data organizations which are designed to meet different workload requirements respectively and wildly used in big data storage systems. Differentiated workload requirements in different phase of the data lifecycle, e.g. various types of data are injected into the big data storage systems in the write optimized manner, then they are needed to be read in the read optimized manner for analysis, lead to data organization transformation(data transformation for short). However, the simple mixture of foreground data injection and background data transformation causes serious disk contention. Frequent disk head seeks result in low disk throughput, and not only prolong the data transformation process, but also increase foreground data injection latency. In this paper, we propose \emph{Workload Shifting}, a novel log- structured design that shifts background data transformation away from the foreground data injection. Compared with conventional RAID0 disk array, \emph{Workload Shifting} effectively isolates background data transformation and foreground data injections, avoids the disk contention between them to boost their performance. We have implemented \emph{Workload Shifting} prototype on one multiple disks based disk array. Extensive experimental evaluation results show that compared with conventional RAID0 disk arrays, \emph{Workload Shifting} can avoid disk contention and speed up both data injection and data transformation significantly.

[1]  Hong Jiang,et al.  RoLo: A Rotated Logging Storage Architecture for Enterprise Data Centers , 2010, 2010 IEEE 30th International Conference on Distributed Computing Systems.

[2]  Karsten Schwan,et al.  Design of a Write-Optimized Data Store , 2013 .

[3]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[4]  Eric Jul,et al.  Lithium: virtual machine storage for the cloud , 2010, SoCC '10.

[5]  Ajay Gulati,et al.  Storage Workload Characterization and Consolidation in Virtualized Environments , 2008 .

[6]  Sara McMains,et al.  File System Logging versus Clustering: A Performance Comparison , 1995, USENIX.

[7]  Hakim Weatherspoon,et al.  Gecko: contention-oblivious disk arrays for cloud storage , 2013, FAST.

[8]  Patrick E. O'Neil,et al.  The log-structured merge-tree (LSM-tree) , 1996, Acta Informatica.

[9]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[10]  Irfan Ahmad,et al.  Pesto: online storage performance management in virtualized datacenters , 2011, SoCC.

[11]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[12]  Jeanna Neefe Matthews,et al.  Improving the performance of log-structured file systems with adaptive methods , 1997, SOSP.

[13]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[14]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[15]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[16]  Emin Gün Sirer,et al.  HyperDex: a distributed, searchable key-value store , 2012, SIGCOMM '12.

[17]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[18]  Jin Xiong,et al.  Chameleon: a data organization transformation scheme for big data systems , 2014, Conf. Computing Frontiers.