Facilitating external sorting on SMR-based large-scale storage systems

Abstract In the big data era, retaining the capability to process and store the sheer amount of data has become a necessity for data-intensive computing. To meet the requirement of big data processing, the storage-centric computing concept of processing data within storage devices has gained its popularity over the years, because the latency and energy consumed by moving data between host systems and storage devices gradually exceed that of processing data. To process data for data-intensive computing, one of the fundamental data processing technique is external sorting, which is widely used in database management systems (DBMS) and Hadoop framework. On the other hand, to store the ever-increasing volumes of data, shingled magnetic recording (SMR) drives have been proposed to increase the areal density of conventional hard disk drives (HDDs) via overlapping adjacent tracks. The SMR drive is widely regarded as a promising technology for the big data application because SMR drives can boost the capacity of HDDs without significant technology changes. Nevertheless, the overlapped track layout of SMR drive imposes the sequential write constraint on incoming write traffic, thus worsening the efficiency of performing external sorting on SMR drives. Such an observation motivates us to propose an SMR -based E xternal M erge S ort (SMR-EMS) strategy for SMR-based large-scale storage systems with the goals of alleviating the negative impacts of sequential write constraint and enhancing the performance of external sorting on SMR drives via utilizing the concept of storage-centric computing. Experiments were conducted to demonstrate the capability of the proposed strategy on improving the efficiency of external merge sorting on SMR drives.

[1]  Ahmed Amer,et al.  Classifying data to reduce long term data movement in shingled write disks , 2015, 2015 31st Symposium on Mass Storage Systems and Technologies (MSST).

[2]  Ziqi Fan,et al.  Evaluating Host Aware SMR Drives , 2016, HotStorage.

[3]  Sai Narasimhamurthy,et al.  The SAGE project: a storage centric approach for exascale computing: invited paper , 2018, CF.

[4]  Curtis R. Cook,et al.  Best sorting algorithm for nearly sorted lists , 1980, CACM.

[5]  Goetz Graefe,et al.  Memory management during run generation in external sorting , 1998, SIGMOD '98.

[6]  Wei-Kuan Shih,et al.  A new sequential-write-constrained cache management to mitigate write amplification for SMR drives , 2019, SAC.

[7]  Charbel Tannous,et al.  Magnetic Information-Storage Materials , 2017 .

[8]  Goetz Graefe,et al.  Implementing sorting in database systems , 2006, CSUR.

[9]  Frank Singhoff,et al.  MONTRES : Merge ON-the-Run External Sorting Algorithm for Large Data Volumes on SSD Based Storage Systems , 2017, IEEE Transactions on Computers.

[10]  Lu Xu,et al.  HMSS: A High Performance Host-Managed Shingled Storage System Based on Awareness of SMR on Block Layer , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[11]  Wei-Kuan Shih,et al.  Mitigating write amplification issue of SMR drives via the design of sequential-write-constrained cache , 2019, J. Syst. Archit..

[12]  Per-Åke Larson,et al.  Speeding up External Mergesort , 1996, IEEE Trans. Knowl. Data Eng..

[13]  Christos Faloutsos,et al.  Active Disks for Large-Scale Data Processing , 2001, Computer.

[14]  Jalil Boukhobza,et al.  Flash Memory Integration: Performance and Energy Issues , 2017 .

[15]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[16]  Abutalib Aghayev,et al.  Modeling Drive-Managed SMR Performance , 2017, ACM Trans. Storage.

[17]  Wei-Kuan Shih,et al.  KVFTL: Optimization of storage space utilization for key-value-specific flash storage devices , 2017, 2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC).

[18]  David A. Thompson,et al.  The future of magnetic data storage technology , 2000, IBM J. Res. Dev..

[19]  H. Iwasaki,et al.  Future Options for HDD Storage , 2009, IEEE Transactions on Magnetics.

[20]  Zili Shao,et al.  Alleviating Hot Data Write Back Effect for Shingled Magnetic Recording Storage Systems , 2019, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[21]  Per-Åke Larson,et al.  Buffering and Read-Ahead Strategies for External Mergesort , 1998, VLDB.

[22]  Zvonimir Bandic,et al.  Indirection systems for shingled-recording disk drives , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[23]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[24]  P. Desnoyers,et al.  Skylight—A Window on Shingled Disk Operation , 2015, FAST.

[25]  Garth A. Gibson,et al.  Shingled Magnetic Recording: Areal Density Increase Requires New Data Management , 2013, login Usenix Mag..

[26]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.

[27]  Sanghyun Park,et al.  External Mergesort for Flash-Based Solid State Drives , 2016, IEEE Transactions on Computers.

[28]  Jin-Soo Kim,et al.  ActiveSort: Efficient external sorting using active SSDs in the MapReduce framework , 2016, Future Gener. Comput. Syst..

[29]  Sai Narasimhamurthy,et al.  SAGE: Percipient Storage for Exascale Data Centric Computing , 2018, Parallel Comput..

[30]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[31]  Sang-Won Lee,et al.  In-storage processing of database scans and joins , 2016, Inf. Sci..