Performance impact and interplay of SSD parallelism through advanced commands, allocation strategy and data granularity

With the development of the NAND-Flash technology, NAND-Flash based Solid-State Disk (SSD) has been attracting a great deal of attention from both industry and academia. While a range of SSD research topics, from interface techniques to buffer management and Flash Translation Layer (FTL), from performance to endurance and energy efficiency, have been extensively studied in the literature, the SSD being studied was by and large treated as a grey or black box in that many of the internal features such as advanced commands, physical-page allocation schemes and data granularity are hidden or assumed away. We argue that, based on our experimental study, it is these internal features and their interplay that will help provide the missing but significant insights to designing high-performance and high-endurance SSDs. In this paper, we use our highly accurate and multi-tiered SSD simulator, called SSDsim, to analyze several key internal SSD factors to characterize their performance impacts, interplay and parallelisms for the purpose of performance and endurance en-hancement of SSDs. From the results of our experiments, we found that: (1) larger pages tend to have significantly negative impact on SSD performance under many workloads; (2) different physical-page allocation schemes have different deployment en-vironments, where an optimal allocation scheme can be found for each workload; (3) although advanced commands provided by flash manufacturers can improve performance in some cases, they may jeopardize the SSD performance and endurance when used inappropriately; (4) since the parallelisms of SSD can be classified into four levels, namely, channel-level, chip-level, die-level and plane-level, the priority order of SSD parallelism, resulting from the strong interplay among physical-page allocation schemes and advanced commands, can have a very significant impact on SSD performance and endurance.

[1]  Hyojun Kim,et al.  BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage , 2008, FAST.

[2]  Joonwon Lee,et al.  CFLRU: a replacement algorithm for flash memory , 2006, CASES '06.

[3]  Dan Feng,et al.  Achieving page-mapping FTL performance at block-mapping FTL cost by hiding address translation , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[4]  Goetz Graefe,et al.  The five-minute rule twenty years later, and how flash memory changes the rules , 2007, DaMoN '07.

[5]  Youngjae Kim,et al.  FlashSim: A Simulator for NAND Flash-Based Solid-State Drives , 2009, 2009 First International Conference on Advances in System Simulation.

[6]  KimJin-Soo,et al.  A multi-channel architecture for high-performance NAND flash-based storage system , 2007 .

[7]  Eui-Young Chung,et al.  Design and analysis of flash translation layers for multi-channel NAND flash-based storage devices , 2009, IEEE Transactions on Consumer Electronics.

[8]  Joonwon Lee,et al.  Exploiting Internal Parallelism of Flash-based SSDs , 2010, IEEE Computer Architecture Letters.

[9]  Peter Desnoyers,et al.  Empirical evaluation of NAND flash memory performance , 2010, OPSR.

[10]  Jin-Soo Kim,et al.  FAB: flash-aware buffer management policy for portable media players , 2006, IEEE Transactions on Consumer Electronics.

[11]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[12]  Seung Ryoul Maeng,et al.  FTL design exploration in reconfigurable high-performance SSD for server applications , 2009, ICS.

[13]  Sang-Won Lee,et al.  System Software for Flash Memory: A Survey , 2006, EUC.

[14]  E. L. Miller,et al.  Building Flexible , Fault-Tolerant Flash-based Storage Systems , 2009 .

[15]  Bruce Jacob,et al.  The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization , 2009, ISCA '09.

[16]  Youngjae Kim,et al.  DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings , 2009, ASPLOS.

[17]  Peter Desnoyers,et al.  Write Endurance in Flash Drives: Measurements and Analysis , 2010, FAST.

[18]  Adam Leventhal Flash Storage Today , 2008, ACM Queue.

[19]  Mark Moshayedi,et al.  Enterprise SSDs , 2008, ACM Queue.

[20]  MaengSeungryoul,et al.  Exploiting Internal Parallelism of Flash-based SSDs , 2010 .

[21]  Sivan Toledo,et al.  Algorithms and data structures for flash memories , 2005, CSUR.

[22]  Joonwon Lee,et al.  A multi-channel architecture for high-performance NAND flash-based storage system , 2007, J. Syst. Archit..

[23]  Seung Ryoul Maeng,et al.  A buffer replacement algorithm exploiting multi-chip parallelism in solid state disks , 2009, CASES '09.

[24]  JacobBruce,et al.  The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization , 2009 .

[25]  Xiaodong Zhang,et al.  Understanding intrinsic characteristics and system implications of flash memory based solid state drives , 2009, SIGMETRICS '09.

[26]  Roberto Bez,et al.  Introduction to flash memory , 2003, Proc. IEEE.

[27]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX ATC.

[28]  Arun Jagatheesan,et al.  Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[29]  Sang-Won Lee,et al.  A log buffer-based flash translation layer using fully-associative sector translation , 2007, TECS.

[30]  Paul H. Siegel,et al.  Characterizing flash memory: Anomalies, observations, and applications , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).