SSD-optimized workload placement with adaptive learning and classification in HPC environments

In recent years, non-volatile memory devices such as SSD drives have emerged as a viable storage solution due to their increasing capacity and decreasing cost. Due to the unique capability and capacity requirements in large scale HPC (High Performance Computing) storage environment, a hybrid configuration (SSD and HDD) may represent one of the most available and balanced solutions considering the cost and performance. Under this setting, effective data placement as well as movement with controlled overhead become a pressing challenge. In this paper, we propose an integrated object placement and movement framework and adaptive learning algorithms to address these issues. Specifically, we present a method that shuffle data objects across storage tiers to optimize the data access performance. The method also integrates an adaptive learning algorithm where realtime classification is employed to predict the popularity of data object accesses, so that they can be placed on, or migrate between SSD or HDD drives in the most efficient manner. We discuss preliminary results based on this approach using a simulator we developed to show that the proposed methods can dynamically adapt storage placements and access pattern as workloads evolve to achieve the best system level performance such as throughput.

[1]  Feng Chen,et al.  Hystor: making the best use of solid state drives in high performance storage systems , 2011, ICS '11.

[2]  C. Kirsch Combo Drive : Optimizing Cost and Performance in a Heterogeneous Storage Device , 2009 .

[3]  Barry V. Hess,et al.  Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis , 2010, HiPC 2010.

[4]  Stephen L. Scott,et al.  Reliability-Aware Approach: An Incremental Checkpoint/Restart Model in HPC Environments , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[5]  Ethan L. Miller,et al.  Replication under scalable hashing: a family of algorithms for scalable decentralized data distribution , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[6]  Antony I. T. Rowstron,et al.  Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility , 2001, SOSP.

[7]  Mithuna Thottethodi,et al.  SieveStore: a highly-selective, ensemble-level disk cache for cost-performance , 2010, ISCA '10.

[8]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[9]  Hua Wang,et al.  A novel I/O scheduler for SSD with improved performance and lifetime , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  Marc Najork,et al.  Boxwood: Abstractions as the Foundation for Storage Infrastructure , 2004, OSDI.

[11]  Gang Wang,et al.  Proactive drive failure prediction for large scale storage systems , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[12]  Xiaodong Zhang,et al.  Understanding intrinsic characteristics and system implications of flash memory based solid state drives , 2009, SIGMETRICS '09.

[13]  Qing Yang,et al.  I-CASH: Intelligently Coupled Array of SSD and HDD , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[14]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[15]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM '01.

[16]  Jin-Soo Kim,et al.  OSSD: A case for object-based solid state drives , 2013, 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST).

[17]  Min Cai,et al.  A Peer-to-Peer Replica Location Service Based on a Distributed Hash Table , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[18]  Song Jiang,et al.  iTransformer: Using SSD to Improve Disk Scheduling for High-performance I/O , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[19]  Kai Shen,et al.  A performance evaluation of scientific I/O workloads on Flash-based SSDs , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[20]  Karsten Schwan,et al.  Managing Variability in the IO Performance of Petascale Storage Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.