Performance Optimization for All Flash Scale-Out Storage

The proliferation of the big data analysis and the wide spread usage of public/private cloud services make it important to expand the storage capacity as the demand is increased. The scale-out storage is gaining more attention since it can inherently provide scalable storage capacity. The flash SSD, on the other hand, is getting popular as the drop-in replacement of the slow HDD, which seems to boost the system performance somewhat at least. However, the performance of traditional scale-out storage system does not get much better even though its HDD is replaced with the flash based high performance SSD since the whole system is designed based on HDD as its underlying storage device. In this paper, we identify performance problems of a representative scale-out storage system, Ceph, and analyze that these problems are caused by 1) Coarse-grained lock, 2) Throttling logic, 3) Batching based operation latency and 4) Transaction Overhead. We propose some optimization techniques for flash-based Ceph. First, we minimize coarse-grained locking. Second, we introduce throttle policy and system tuning. Third, we develop non-blocking logging and light-weight transaction processing. We found that our optimized Ceph shows up to 20 times improvement in the case of small random writes and it also shows more than two times better performance in the case of small random read through our experiments. We also show that the system exhibits linear performance increase as we add more nodes.

[1]  GhemawatSanjay,et al.  The Google file system , 2003 .

[2]  D. M. Hutton,et al.  The Art of Multiprocessor Programming , 2008 .

[3]  Kai Ren,et al.  IndexFS: Scaling File System Metadata Performance with Stateless Caching and Bulk Insertion , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Raghu Ramakrishnan,et al.  bLSM: a general purpose log structured merge tree , 2012, SIGMOD Conference.

[5]  Ashish Gupta,et al.  The RAMCloud Storage System , 2015, ACM Trans. Comput. Syst..

[6]  Song Jiang,et al.  LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items , 2015, USENIX Annual Technical Conference.

[7]  Sang-Won Lee,et al.  SFS: random write considered harmful in solid state drives , 2012, FAST.

[8]  Hyeonsang Eom,et al.  Optimizing the Block I/O Subsystem for Fast Storage Devices , 2014, ACM Trans. Comput. Syst..

[9]  Antony I. T. Rowstron,et al.  IOFlow: a software-defined storage architecture , 2013, SOSP.

[10]  Andrea C. Arpaci-Dusseau,et al.  Analysis of HDFS under HBase: a facebook messages case study , 2014, FAST.

[11]  David J. Lilja,et al.  High performance solid state storage under Linux , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[12]  Marcos K. Aguilera,et al.  Consistency-based service level agreements for cloud storage , 2013, SOSP.

[13]  Byung-Gon Chun,et al.  Usenix Association 10th Usenix Symposium on Operating Systems Design and Implementation (osdi '12) 135 Megapipe: a New Programming Interface for Scalable Network I/o , 2022 .

[14]  Youyou Lu,et al.  Loose-Ordering Consistency for persistent memory , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).

[15]  Hwanju Kim,et al.  Request-Oriented Durable Write Caching for Application Performance , 2015, USENIX Annual Technical Conference.

[16]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.

[17]  Jeffrey C. Mogul,et al.  Rethinking the TCP Nagle algorithm , 2001, CCRV.

[18]  Antony I. T. Rowstron,et al.  Scale-up vs scale-out for Hadoop: time to rethink? , 2013, SoCC.

[19]  Jiqiang Liu,et al.  Analysis of Interrupt Coalescing Schemes for Receive-Livelock Problem in Gigabit Ethernet Network Hosts , 2008, 2008 IEEE International Conference on Communications.

[20]  Scott A. Brandt,et al.  Ceph: reliable, scalable, and high-performance distributed storage , 2007 .

[21]  Carlos Maltzahn,et al.  Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.

[22]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[23]  Bin Fan,et al.  SILT: a memory-efficient, high-performance key-value store , 2011, SOSP.

[24]  Kai Shen,et al.  FIOS: a fair, efficient flash I/O scheduler , 2012, FAST.