SOML Read: Rethinking the Read Operation Granularity of 3D NAND SSDs

NAND-based solid-state disks (SSDs) are known for their superior random read/write performance due to the high degrees of multi-chip parallelism they exhibit. Currently, as the chip density increases dramatically, fewer 3D NAND chips are needed to build an SSD compared to the previous generation chips. As a result, SSDs can be made more compact. However, this decrease in the number of chips also results in reduced overall throughput, and prevents 3D NAND high density SSDs from being widely-adopted. We analyzed 600 storage workloads, and our analysis revealed that the small read operations suffer significant performance degradation due to reduced chip-level parallelism in newer 3D NAND SSDs. The main question is whether some of the inter-chip parallelism lost in these new SSDs (due to the reduced chip count) can be won back by enhancing intra-chip parallelism. Motivated by this question, we propose a novel SOML (Single-Operation-Multiple-Location) read operation, which can perform several small intra-chip read operations to different locations simultaneously, so that multiple requests can be serviced in parallel, thereby mitigating the parallelism-related bottlenecks. A corresponding SOML read scheduling algorithm is also proposed to fully utilize the SOML read. Our experimental results with various storage workloads indicate that, the SOML read-based SSD with 8 chips can outperform the baseline SSD with 16 chips.

[1]  Alexander S. Szalay,et al.  FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs , 2014, FAST.

[2]  Mahmut T. Kandemir,et al.  HIOS: A host interface I/O scheduler for Solid State Disks , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[3]  Onur Mutlu,et al.  Read Disturb Errors in MLC NAND Flash Memory: Characterization, Mitigation, and Recovery , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[4]  Andrew A. Chien,et al.  Tiny-Tail Flash: Near-Perfect Elimination of Garbage Collection Tail Latencies in NAND SSDs , 2017, FAST.

[5]  Hamid Sarbazi-Azad,et al.  Performance Evaluation of Dynamic Page Allocation Strategies in SSDs , 2016, ACM Trans. Model. Perform. Evaluation Comput. Syst..

[6]  Luca Crippa,et al.  Inside NAND Flash Memories , 2010 .

[7]  Kyungmin Kim,et al.  A 1Tb 4b/cell 64-stacked-WL 3D NAND flash memory with 12MB/s program throughput , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[8]  Jeong-Don Ihm,et al.  7.1 256Gb 3b/cell V-NAND flash memory with 48 stacked WL layers , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[9]  David W. Nellans,et al.  Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.

[10]  Mahmut T. Kandemir,et al.  PEN: Design and Evaluation of Partial-Erase for 3D NAND-Based High Density SSDs , 2018, FAST.

[11]  Mohammad Arjomand,et al.  Unleashing the potentials of dynamism for page allocation strategies in SSDs , 2014, SIGMETRICS '14.

[12]  Mahmut T. Kandemir,et al.  Network footprint reduction through data access and computation placement in NoC-based manycores , 2015, 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  You Zhou,et al.  Characterizing 3D Floating Gate NAND Flash , 2017, SIGMETRICS.

[14]  Ying Yu,et al.  11.1 A 512Gb 3b/cell flash memory on 64-word-line-layer BiCS technology , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[15]  Ren-Shuo Liu,et al.  Improving Read Performance of NAND Flash SSDs by Exploiting Error Locality , 2016, IEEE Transactions on Computers.

[16]  Mohammad Arjomand,et al.  Exploiting Intra-Request Slack to Improve SSD Performance , 2017, ASPLOS.

[17]  Onur Mutlu,et al.  Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation , 2013, ICCD.

[18]  Mahmut T. Kandemir,et al.  Quantifying the Potential Benefits of On-chip Near-Data Computing in Manycore Processors , 2017, 2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS).

[19]  Mahmut T. Kandemir,et al.  CHAMELEON: A Dynamically Reconfigurable Heterogeneous Memory System , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Wook-Ghee Hahn,et al.  7.2 A 128Gb 3b/cell V-NAND flash memory with 1Gb/s I/O rate , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[21]  John Shalf,et al.  TraceTracker: Hardware/software co-evaluation for large-scale I/O workload reconstruction , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).

[22]  Mahmut T. Kandemir,et al.  Meeting midway: Improving CMP performance with memory-side prefetching , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[23]  Mahmut T. Kandemir,et al.  Improving bank-level parallelism for irregular applications , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  Onur Mutlu,et al.  Data retention in MLC NAND flash memory: Characterization, optimization, and recovery , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[25]  Jonghoon Park,et al.  11.4 A 512Gb 3b/cell 64-stacked WL 3D V-NAND flash memory , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[26]  Mahmut T. Kandemir,et al.  Congestion-aware memory management on NUMA platforms: A VMware ESXi case study , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).

[27]  Youngjae Kim,et al.  DFTL: a flash translation layer employing demand-based selective caching of page-level address mappings , 2009, ASPLOS.

[28]  Onur Mutlu,et al.  HeatWatch: Improving 3D NAND Flash Memory Device Reliability by Exploiting Self-Recovery and Temperature Awareness , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[29]  Mahmut T. Kandemir,et al.  MDACache: Caching for Multi-Dimensional-Access Memories , 2018, 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[30]  Mahmut T. Kandemir,et al.  Cache-Aware Approximate Computing for Decision Tree Learning , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[31]  Yoondong Park,et al.  Multi-layered Vertical Gate NAND Flash overcoming stacking limit for terabit density storage , 2006, 2009 Symposium on VLSI Technology.

[32]  Mahmut T. Kandemir,et al.  Controlled Kernel Launch for Dynamic Parallelism in GPUs , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[33]  Hong Jiang,et al.  Performance impact and interplay of SSD parallelism through advanced commands, allocation strategy and data granularity , 2011, ICS '11.

[34]  Onur Mutlu,et al.  Error patterns in MLC NAND flash memory: Measurement, characterization, and analysis , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[35]  Heeseung Jo,et al.  Superblock FTL: A superblock-based flash translation layer with a hybrid address translation scheme , 2010, TECS.

[36]  Xu Li,et al.  A 512Gb 3b/Cell 3D flash memory on a 96-word-line-layer technology , 2018, 2018 IEEE International Solid - State Circuits Conference - (ISSCC).

[37]  Sungroh Yoon,et al.  Near-Data Processing for Machine Learning , 2016, ArXiv.

[38]  Mohammad Arjomand,et al.  Re-NUCA: A Practical NUCA Architecture for ReRAM Based Last-Level Caches , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[39]  Da-Wei Chang,et al.  FastRead: Improving Read Performance for Multilevel-Cell Flash Memory , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[40]  Xubin He,et al.  Reducing SSD read latency via NAND flash program and erase suspension , 2012, FAST.

[41]  Yuan-Hao Chang,et al.  Read leveling for flash storage systems , 2015, SYSTOR.

[42]  Nanning Zheng,et al.  LDPC-in-SSD: making advanced error correction codes work effectively in solid state drives , 2013, FAST.

[43]  Mahmut T. Kandemir,et al.  Data Movement Aware Computation Partitioning , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).