Latency Tails of Byte-Addressable Non-Volatile Memories in Systems

Next generation non-volatile memories, like Resistive RAM, Spin-Transfer Torque Magnetic RAM and Phase Change Memory, are byte- addressable with very low latency, bridging the large performance gap between DRAM memory and NAND flash storage. For this reason we think of them as Storage Class Memories (SCMs), meaning their main use could ideally be as main memory but the non-volatility and high density could also fill some of the needs for durable storage. The path to using SCMs as main memory will necessitate significant changes to prevailing CPU architectures, so at first our focus was on enabling their early market adoption as ultrafast storage in commodity systems. In stark contrast to NAND flash, whose read latency of a tenth of a millisecond dominates the total system response latency to a storage request, SCM-based devices are so fast that attach interface and host device driver latencies, which are in the microsecond domain, start to dominate the total response latency, hindering greatly the performance of SCMs in commodity systems. Moreover, the latency jitter introduced by host hardware and software and by controller firmware further affects the Quality of Service (QoS) of solid-state drives based on SCMs. In this paper we discuss various factors that degrade the QoS, including host software and machine configurations. A particular fine- tuning of an x86 host machine, a well-designed device driver and a low latency device controller result in an ultra-low latency system with excellent QoS. We measure less than 4 μs latency for 99.999% of I/O requests at queue depth one, and less than 7 μs at queue depth 32, from an SCM-based block device on PCI Express interface.