Modelling Speculative Prefetching for Hybrid Storage Systems

Parallel storage systems have been highly scalable and widely used in support of data-intensive applications. In future systems with the nature of massive data processing and storing, hybrid storage systems opt for a solution to fulfill a variety of demands such as large storage capacity, high I/O performance and low cost. Hybrid storage systems (HSS) contain both high-end storage components (e.g. solid-state disks and hard disk drives) to guarantee performance, and low-end storage components (e.g. tapes) to reduce cost. In HSS, transferring data back and forth among solid-state disks (SSDs), hard disk drives (HDDs), and tapes plays a critical role in achieving high I/O performance. Prefetching is a promising solution to reduce the latency of data transferring in HSS. However, prefetching in the context of HSS is technically challenging due to an interesting dilemma: aggressive prefetching is required to efficiently reduce I/O latency, whereas overaggressive prefetching may waste I/O bandwidth by transferring useless data from HDDs to SSDs or from tapes to HDDs. To address this problem, we propose a multi-layer prefetching algorithm that can speculatively prefetch data from tapes to HDDs and from HDDs to SSDs. To evaluate our algorithm, we develop an analytical model and the experimental results reveal that our prefetching algorithm improves the performance in hybrid storage systems.

[1]  Leonard Kleinrock,et al.  An adaptive network prefetch scheme , 1998, IEEE J. Sel. Areas Commun..

[2]  Yale N. Patt,et al.  Disk arrays: high-performance, high-reliability storage subsystems , 1994, Computer.

[3]  Kenneth M. Curewitz,et al.  Practical Prefetching via Data Compression Practical Prefetching via Data Compression , 1993 .

[4]  Christos Faloutsos,et al.  Fundamentals of Scheduling and Performance of Video Tape Libraries , 2004, Multimedia Tools and Applications.

[5]  P. Brown,et al.  Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Sung Hoon Baek,et al.  Reliability and performance of hierarchical RAID with multiple controllers , 2001, PODC '01.

[7]  Hong Jiang,et al.  PRO: A Popularity-based Multi-threaded Reconstruction Optimization for RAID-Structured Storage Systems , 2007, FAST.

[8]  J. Spencer Love,et al.  Caching strategies to improve disk system performance , 1994, Computer.

[9]  Jichuan Chang,et al.  Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.

[10]  Jim Griffioen,et al.  Reducing File System Latency using a Predictive Approach , 1994, USENIX Summer.

[11]  D.B. Trizna Microwave and HF multi-frequency radars for dual-use coastal remote sensing applications , 2005, Proceedings of OCEANS 2005 MTS/IEEE.

[12]  Randy H. Katz,et al.  Striped tape arrays , 1993, [1993] Proceedings Twelfth IEEE Symposium on Mass Storage systems.

[13]  Ravi R. Iyer,et al.  CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.

[14]  Weng-Fai Wong,et al.  Compiler orchestrated prefetching via speculation and predication , 2004, ASPLOS XI.

[15]  T. Gerstel Streams and Standards: Delivering Mobile Video , 2005, ACM Queue.

[16]  Surendra Byna,et al.  Exploring Parallel I/O Concurrency with Speculative Prefetching , 2008, 2008 37th International Conference on Parallel Processing.

[17]  John Hawkins,et al.  The applicability of recurrent neural networks for biological sequence analysis , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Sunil Prabhakar,et al.  Data Placement for Tertiary Storage , 2002 .

[19]  Jean-Loup Baer,et al.  An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[20]  Dietmar Kaletta,et al.  Improved adaptive replacement algorithm for disk caches in HSM systems , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).

[21]  Yan Solihin,et al.  Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[22]  P. Krishnan,et al.  Practical prefetching via data compression , 1993 .

[23]  Hyeonsang Eom,et al.  Speed vs. accuracy in simulation for I/O-intensive applications , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[24]  Mohan Kumar,et al.  Investigation of a prefetch model for low bandwidth networks , 1998, WOWMOM '98.

[25]  Jeffrey C. Mogul,et al.  Using predictive prefetching to improve World Wide Web latency , 1996, CCRV.

[26]  Carla Schlatter Ellis,et al.  Caching and Writeback Policies in Parallel File Systems , 1993, J. Parallel Distributed Comput..

[27]  Alex E. Bell UML Fever: Diagnosis and Recovery , 2005, ACM Queue.

[28]  Joseph F. Murray,et al.  Reliability and security of RAID storage systems and D2D archives using SATA disk drives , 2005, TOS.

[29]  Robert Y. Hou,et al.  Balancing I/O response time and disk rebuild time in a RAID5 disk array , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.