Exploring and Exploiting the Multilevel Parallelism Inside SSDs for Improved Performance and Endurance

Given the multilevel internal SSD parallelism at the different four levels: channel-level, chip-level, die-level, and plane-level, how to exploit these levels of parallelism will directly and significantly impact the performance and endurance of SSDs, which is in turn primarily determined by three internal factors, namely, advanced commands, allocation schemes, and the priority order of exploiting the four levels of parallelism. In this paper, we analyze these internal factors to characterize their impacts, interplay, and parallelism for the purpose of performance and endurance enhancement of SSDs through an in-depth experimental study. We come to the following key conclusions: 1) Different advanced commands provided by Flash manufacturers exploit different levels of parallelism inside SSDs, where they can either improve or degrade the SSD performance and endurance depending on how they are used; 2) Different physical-page allocation schemes employ different advanced commands and exploit different levels of parallelism inside SSDs, giving rise to different performance and endurance impacts; 3) The priority order of using the four levels of parallelism has the most significant performance and endurance impact among the three internal factors. The optimal priority order of using the four levels of parallelism in SSDs is found to be: 1) the channel-level parallelism; 2) the die-level parallelism; 3) the plane-level parallelism; and 4) the chip-level parallelism.

[1]  Eui-Young Chung,et al.  Design and analysis of flash translation layers for multi-channel NAND flash-based storage devices , 2009, IEEE Transactions on Consumer Electronics.

[2]  Jin-Soo Kim,et al.  FAB: flash-aware buffer management policy for portable media players , 2006, IEEE Transactions on Consumer Electronics.

[3]  Joonwon Lee,et al.  Exploiting Internal Parallelism of Flash-based SSDs , 2010, IEEE Computer Architecture Letters.

[4]  Youngjoon Choi,et al.  A High Performance Controller for NAND Flash-based Solid State Disk (NSSD) , 2006, 2006 21st IEEE Non-Volatile Semiconductor Memory Workshop.

[5]  Xiaodong Zhang,et al.  Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[6]  Hong Jiang,et al.  Performance impact and interplay of SSD parallelism through advanced commands, allocation strategy and data granularity , 2011, ICS '11.

[7]  ToledoSivan,et al.  Algorithms and data structures for flash memories , 2005 .

[8]  Goetz Graefe,et al.  The five-minute rule twenty years later, and how flash memory changes the rules , 2007, DaMoN '07.

[9]  KimJin-Soo,et al.  A multi-channel architecture for high-performance NAND flash-based storage system , 2007 .

[10]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[11]  Seung Ryoul Maeng,et al.  FTL design exploration in reconfigurable high-performance SSD for server applications , 2009, ICS.

[12]  Arun Jagatheesan,et al.  Understanding the Impact of Emerging Non-Volatile Memories on High-Performance, IO-Intensive Computing , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Sang-Won Lee,et al.  A log buffer-based flash translation layer using fully-associative sector translation , 2007, TECS.

[14]  Joonwon Lee,et al.  CFLRU: a replacement algorithm for flash memory , 2006, CASES '06.

[15]  Sang-Won Lee,et al.  System Software for Flash Memory: A Survey , 2006, EUC.

[16]  Roberto Bez,et al.  Introduction to flash memory , 2003, Proc. IEEE.

[17]  E. L. Miller,et al.  Building Flexible , Fault-Tolerant Flash-based Storage Systems , 2009 .

[18]  Bruce Jacob,et al.  The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization , 2009, ISCA '09.

[19]  Youngjae Kim,et al.  DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings , 2009, ASPLOS.

[20]  Paul H. Siegel,et al.  Characterizing flash memory: Anomalies, observations, and applications , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[21]  Peter Desnoyers,et al.  Write Endurance in Flash Drives: Measurements and Analysis , 2010, FAST.

[22]  Adam Leventhal Flash Storage Today , 2008, ACM Queue.

[23]  Joonwon Lee,et al.  A multi-channel architecture for high-performance NAND flash-based storage system , 2007, J. Syst. Archit..

[24]  Seung Ryoul Maeng,et al.  A buffer replacement algorithm exploiting multi-chip parallelism in solid state disks , 2009, CASES '09.

[25]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX ATC.

[26]  Young-Hyun Jun,et al.  A new 3-bit programming algorithm using SLC-to-TLC migration for 8MB/s high performance TLC NAND flash memory , 2012, 2012 Symposium on VLSI Circuits (VLSIC).

[27]  Hyojun Kim,et al.  BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage , 2008, FAST.

[28]  Philippe Bonnet,et al.  uFLIP: Understanding Flash IO Patterns , 2009, CIDR.

[29]  Youngjae Kim,et al.  FlashSim: A Simulator for NAND Flash-Based Solid-State Drives , 2009, 2009 First International Conference on Advances in System Simulation.

[30]  Peter Desnoyers,et al.  Empirical evaluation of NAND flash memory performance , 2010, OPSR.

[31]  Xiaodong Zhang,et al.  Understanding intrinsic characteristics and system implications of flash memory based solid state drives , 2009, SIGMETRICS '09.

[32]  Mark Moshayedi,et al.  Enterprise SSDs , 2008, ACM Queue.

[33]  Dan Feng,et al.  Achieving page-mapping FTL performance at block-mapping FTL cost by hiding address translation , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).