Parallelizing garbage collection with I/O to improve flash resource utilization

Garbage Collection (GC) has been a critical optimization target for improving the performance of flash-based Solid State Drives (SSDs); the long-lasting GC process occupies the flash resources, thereby blocking normal I/O requests and increasing response times. This is a well-documented problem, and a wide range of prior works successfully hide the negative impact of GC on the 1/O response times. In this paper, however, we unveil another serious side-effect of GC, called the plane under-utilization problem. More specifically, while a plane is busy doing GC, the other plane(s) in the same die remain idle, as all the planes in a die share a single command and address path that is dedicated to the GC. We also note that most of the state-of-the-art proposals attacking the GC impact on I/O response times are not able to resolve the plane under-utilization problem, and in turn, miss a great potential to further improve the SSD performance. Thus, we next propose a scheduling technique, I/O-parallelized GC, which leverages the idle planes during GC to serve the blocked I/O requests. As a result, flash resources (planes) can be active during the most of GC time and the blocked I/O requests can get serviced quickly, and in turn, an improved SSD performance can be achieved. Using simulation-based evaluations over a wide variety of workloads, we show that the proposed I/O-parallelized GC scheme can improve the response times of the GC-affected I/O requests by 83% (reads) and 70% (writes), by increasing the average plane utilization from the (two planes-per-die) baseline 50% to 74.4% during GC. The I/O-parallelized GC is orthogonal to prior proposals that hide GC overheads; so, they can be combined for further SSD performance improvement.

[1]  Benny Van Houdt,et al.  A mean field model for a class of garbage collection algorithms in flash-based solid state drives , 2013, Queueing Systems.

[2]  Xiaodong Zhang,et al.  Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[3]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[4]  Seung Ryoul Maeng,et al.  FTL design exploration in reconfigurable high-performance SSD for server applications , 2009, ICS.

[5]  Tao Xie,et al.  DLOOP: A Flash Translation Layer Exploiting Plane-Level Parallelism , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[6]  Mahmut T. Kandemir,et al.  Revisiting widely held SSD expectations and rethinking system-level implications , 2013, SIGMETRICS '13.

[7]  Antony I. T. Rowstron,et al.  Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[8]  Onur Mutlu,et al.  Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation , 2013, ICCD.

[9]  Mahmut T. Kandemir,et al.  An Evaluation of Different Page Allocation Strategies on High-Speed SSDs , 2012, HotStorage.

[10]  Hong Jiang,et al.  Performance impact and interplay of SSD parallelism through advanced commands, allocation strategy and data granularity , 2011, ICS '11.

[11]  Mahmut T. Kandemir,et al.  HIOS: A host interface I/O scheduler for Solid State Disks , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[12]  Bingsheng He,et al.  Improving Update-Intensive Workloads on Flash Disks through Exploiting Multi-Chip Parallelism , 2015, IEEE Transactions on Parallel and Distributed Systems.

[13]  Rina Panigrahy,et al.  Design Tradeoffs for SSD Performance , 2008, USENIX ATC.

[14]  Werner Bux,et al.  Performance of greedy garbage collection in flash-based solid-state drives , 2010, Perform. Evaluation.

[15]  Andrew A. Chien,et al.  Tiny-Tail Flash , 2017, ACM Trans. Storage.

[16]  Peter Desnoyers,et al.  Analytic Models of SSD Write Performance , 2014, TOS.

[17]  Yinfeng Wang,et al.  Exploiting latency variation for access conflict reduction of NAND flash memory , 2016, 2016 32nd Symposium on Mass Storage Systems and Technologies (MSST).

[18]  Mahmut T. Kandemir,et al.  Taking Garbage Collection Overheads Off the Critical Path in SSDs , 2012, Middleware.

[19]  Jongman Kim,et al.  A semi-preemptive garbage collector for solid state drives , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[20]  Mohammad Arjomand,et al.  Exploring the Potentials of Parallel Garbage Collection in SSDs for Enterprise Storage Systems , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Suzhen Wu,et al.  Exploiting request characteristics and internal parallelism to improve SSD performance , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[22]  Jongman Kim,et al.  Preemptible I/O Scheduling of Garbage Collection for Solid State Drives , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23]  Sang-Won Lee,et al.  A survey of Flash Translation Layer , 2009, J. Syst. Archit..

[24]  Hamid Sarbazi-Azad,et al.  TBM: Twin Block Management Policy to Enhance the Utilization of Plane-Level Parallelism in SSDs , 2016, IEEE Computer Architecture Letters.