Physically addressed queueing (PAQ): Improving parallelism in solid state disks

NAND flash storage has proven to be a competitive alternative to traditional disk for its properties of high random-access speeds, low-power and its presumed efficacy for random-reads. Ironically, we demonstrate that when packaged in SSD format, there arise many barriers to reaching full parallelism in reads, resulting in random writes outperforming them. Motivated by this, we propose Physically Addressed Queuing (PAQ), a request scheduler that avoids resource contention resultant from shared SSD resources. PAQ makes the following major contributions: First, it exposes the physical addresses of requests to the scheduler. Second, I/O clumping is utilized to select groups of operations that can be simultaneously executed without major resource conflict. Third, inter-request NAND transaction packing empowers multi-plane-mode operations. We implement PAQ in a cycle-accurate simulator and demonstrate bandwidth and IOPS improvements greater than 62% and latency decreases as much as 41.6% for random reads, without degrading performance of other access types.

[1]  Calvin Lin,et al.  Adaptive History-Based Memory Schedulers for Modern Processors , 2006, IEEE Micro.

[2]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[3]  Kenneth A. Ross,et al.  SSD bufferpool extensions for database systems , 2010, Proc. VLDB Endow..

[4]  Bruce Jacob,et al.  The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization , 2009, ISCA '09.

[5]  Youngjae Kim,et al.  FlashSim: A Simulator for NAND Flash-Based Solid-State Drives , 2009, 2009 First International Conference on Advances in System Simulation.

[6]  Joonwon Lee,et al.  Exploiting Internal Parallelism of Flash-based SSDs , 2010, IEEE Computer Architecture Letters.

[7]  Jack W. Davidson,et al.  Memory access coalescing: a technique for eliminating redundant memory accesses , 1994, PLDI '94.

[8]  Won-Taek Lim,et al.  Effective Management of DRAM Bandwidth in Multicore Processors , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[9]  A RossKenneth,et al.  SSD bufferpool extensions for database systems , 2010, VLDB 2010.

[10]  Trevor N. Mudge,et al.  Improving NAND Flash Based Disk Caches , 2008, 2008 International Symposium on Computer Architecture.

[11]  Yang Liu,et al.  RAF: A Random Access First Cache Management to Improve SSD-Based Disk Cache , 2010, 2010 IEEE Fifth International Conference on Networking, Architecture, and Storage.

[12]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX Annual Technical Conference.

[13]  Mor Harchol-Balter,et al.  ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[14]  Sally A. McKee,et al.  Access order and effective bandwidth for streams on a Direct Rambus memory , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[15]  Steven Swanson,et al.  Gordon: using flash memory to build fast, power-efficient clusters for data-intensive applications , 2009, ASPLOS.

[16]  Antony I. T. Rowstron,et al.  Migrating server storage to SSDs: analysis of tradeoffs , 2009, EuroSys '09.

[17]  John Shalf,et al.  NANDFlashSim: Intrinsic latency variation aware NAND flash memory system modeling and simulation at microarchitecture level , 2012, 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST).

[18]  Mor Harchol-Balter,et al.  Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[19]  James E. Smith,et al.  Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[20]  Xiaodong Zhang,et al.  Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[21]  Sally A. McKee,et al.  Access ordering and memory-conscious cache utilization , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[22]  Tao Li,et al.  Informed Microarchitecture Design Space Exploration Using Workload Dynamics , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[23]  Joonwon Lee,et al.  A multi-channel architecture for high-performance NAND flash-based storage system , 2007, J. Syst. Archit..

[24]  Rajesh K. Gupta,et al.  Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[25]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[26]  JacobBruce,et al.  The performance of PC solid-state disks (SSDs) as a function of bandwidth, concurrency, device architecture, and system organization , 2009 .

[27]  Onur Mutlu,et al.  Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[28]  MaengSeungryoul,et al.  Exploiting Internal Parallelism of Flash-based SSDs , 2010 .