论文信息 - Parallelizing garbage collection with I/O to improve flash resource utilization

Parallelizing garbage collection with I/O to improve flash resource utilization

Garbage Collection (GC) has been a critical optimization target for improving the performance of flash-based Solid State Drives (SSDs); the long-lasting GC process occupies the flash resources, thereby blocking normal I/O requests and increasing response times. This is a well-documented problem, and a wide range of prior works successfully hide the negative impact of GC on the 1/O response times. In this paper, however, we unveil another serious side-effect of GC, called the plane under-utilization problem. More specifically, while a plane is busy doing GC, the other plane(s) in the same die remain idle, as all the planes in a die share a single command and address path that is dedicated to the GC. We also note that most of the state-of-the-art proposals attacking the GC impact on I/O response times are not able to resolve the plane under-utilization problem, and in turn, miss a great potential to further improve the SSD performance. Thus, we next propose a scheduling technique, I/O-parallelized GC, which leverages the idle planes during GC to serve the blocked I/O requests. As a result, flash resources (planes) can be active during the most of GC time and the blocked I/O requests can get serviced quickly, and in turn, an improved SSD performance can be achieved. Using simulation-based evaluations over a wide variety of workloads, we show that the proposed I/O-parallelized GC scheme can improve the response times of the GC-affected I/O requests by 83% (reads) and 70% (writes), by increasing the average plane utilization from the (two planes-per-die) baseline 50% to 74.4% during GC. The I/O-parallelized GC is orthogonal to prior proposals that hide GC overheads; so, they can be combined for further SSD performance improvement.

Mahmut T. Kandemir | Chita R. Das | Myoungsoo Jung | Wonil Choi

[1] Benny Van Houdt,et al. A mean field model for a class of garbage collection algorithms in flash-based solid state drives , 2013, Queueing Systems.

[2] Xiaodong Zhang,et al. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[3] Gregory R. Ganger,et al. The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[4] Seung Ryoul Maeng,et al. FTL design exploration in reconfigurable high-performance SSD for server applications , 2009, ICS.

[5] Tao Xie,et al. DLOOP: A Flash Translation Layer Exploiting Plane-Level Parallelism , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[6] Mahmut T. Kandemir,et al. Revisiting widely held SSD expectations and rethinking system-level implications , 2013, SIGMETRICS '13.

[7] Antony I. T. Rowstron,et al. Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[8] Onur Mutlu,et al. Program interference in MLC NAND flash memory: Characterization, modeling, and mitigation , 2013, ICCD.

[9] Mahmut T. Kandemir,et al. An Evaluation of Different Page Allocation Strategies on High-Speed SSDs , 2012, HotStorage.

[10] Hong Jiang,et al. Performance impact and interplay of SSD parallelism through advanced commands, allocation strategy and data granularity , 2011, ICS '11.

[11] Mahmut T. Kandemir,et al. HIOS: A host interface I/O scheduler for Solid State Disks , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[12] Bingsheng He,et al. Improving Update-Intensive Workloads on Flash Disks through Exploiting Multi-Chip Parallelism , 2015, IEEE Transactions on Parallel and Distributed Systems.

[13] Rina Panigrahy,et al. Design Tradeoffs for SSD Performance , 2008, USENIX ATC.

[14] Werner Bux,et al. Performance of greedy garbage collection in flash-based solid-state drives , 2010, Perform. Evaluation.

[15] Andrew A. Chien,et al. Tiny-Tail Flash , 2017, ACM Trans. Storage.

[16] Peter Desnoyers,et al. Analytic Models of SSD Write Performance , 2014, TOS.

[17] Yinfeng Wang,et al. Exploiting latency variation for access conflict reduction of NAND flash memory , 2016, 2016 32nd Symposium on Mass Storage Systems and Technologies (MSST).

[18] Mahmut T. Kandemir,et al. Taking Garbage Collection Overheads Off the Critical Path in SSDs , 2012, Middleware.

[19] Jongman Kim,et al. A semi-preemptive garbage collector for solid state drives , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[20] Mohammad Arjomand,et al. Exploring the Potentials of Parallel Garbage Collection in SSDs for Enterprise Storage Systems , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[21] Suzhen Wu,et al. Exploiting request characteristics and internal parallelism to improve SSD performance , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[22] Jongman Kim,et al. Preemptible I/O Scheduling of Garbage Collection for Solid State Drives , 2013, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[23] Sang-Won Lee,et al. A survey of Flash Translation Layer , 2009, J. Syst. Archit..

[24] Hamid Sarbazi-Azad,et al. TBM: Twin Block Management Policy to Enhance the Utilization of Plane-Level Parallelism in SSDs , 2016, IEEE Computer Architecture Letters.