An in-depth study of next generation interface for emerging non-volatile memories

Non-Volatile Memory Express (NVMe) is designed with the goal of unlocking the potential of low-latency, randomaccess, memory-based storage devices. Specifically, NVMe employs various rich communication and queuing mechanism that can ideally schedule four billion I/O instructions for a single storage device. To explore NVMe with assorted user scenarios, we model diverse interface-level design parameters such as PCI Express, NVMe protocol, and different rich queuing mechanisms by considering a wide spectrum of host-level system configurations. In this work, we also assemble a comprehensive memory stack with different types of emerging NVM technologies, which can give us detailed NVMe related statistics like I/O request lifespans and I/O thread-related parallelism. Our evaluation results reveal that, i) while NVMe handshaking is light-weight for flash memory that uses block-based accesses (Block NVM), it can impose tremendous overheads for memristor technology (DRAM-like NVM), ii) in contrast to the common expectation, the performance of an NVMe-equipped system may not improve in a scalable fashion as the queue depth and the number of queues increase, and iii) more- and deeperqueue systems atop a Block NVM can significantly suffer from tremendous host-side memory requirements, whereas a DRAMlike NVM can cause frequent system stalls due to NVMe's inefficient interrupt service routine.

[1]  Peter Desnoyers,et al.  Ultra-low power data storage for sensor networks , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[2]  Steven Swanson,et al.  DC express: shortest latency protocol for reading phase change memory over PCI express , 2014, FAST.

[3]  Hyojun Kim,et al.  Evaluating Phase Change Memory for Enterprise Storage Systems: A Study of Caching and Tiering Approaches , 2014, TOS.

[4]  Frank Hady,et al.  When poll is better than interrupt , 2012, FAST.

[5]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[6]  Jin-Soo Kim,et al.  Parameter-Aware I/O Management for Solid State Disks (SSDs) , 2012, IEEE Transactions on Computers.

[7]  Shoji Ikeda,et al.  1Mb 4T-2MTJ nonvolatile STT-RAM for embedded memories using 32b fine-grained power gating technique with 1.0ns/200ps wake-up/power-off times , 2012, 2012 Symposium on VLSI Circuits (VLSIC).

[8]  Antony I. T. Rowstron,et al.  Migrating server storage to SSDs: analysis of tradeoffs , 2009, EuroSys '09.

[9]  Andrea C. Arpaci-Dusseau,et al.  De-indirection for flash-based SSDs with nameless writes , 2012, FAST.

[10]  Dan Feng,et al.  A software-defined fusion storage system for PCM and NAND flash , 2015, 2015 IEEE Non-Volatile Memory System and Applications Symposium (NVMSA).

[11]  Sean Eilert,et al.  Phase Change Memory: A New Memory Enables New Memory Usage Models , 2009, 2009 IEEE International Memory Workshop.

[12]  Rajesh K. Gupta,et al.  Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Mahmut T. Kandemir,et al.  Challenges in Getting Flash Drives Closer to CPU , 2013, HotStorage.

[14]  Peter Desnoyers,et al.  Active flash: towards energy-efficient, in-situ data analytics on extreme-scale machines , 2013, FAST.

[15]  John Shalf,et al.  Exploring the future of out-of-core computing with compute-local non-volatile memory , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[16]  Steven Swanson,et al.  Refactor, Reduce, Recycle: Restructuring the I/O Stack for the Future of Storage , 2013, Computer.

[17]  Carlos Maltzahn,et al.  Flash on Rails: Consistent Flash Performance through Redundancy , 2014, USENIX Annual Technical Conference.

[18]  Jin-Soo Kim,et al.  System-Wide Cooperative Optimization for NAND Flash-Based Mobile Systems , 2014, IEEE Transactions on Computers.