A cost-intelligent application-specific data layout scheme for parallel file systems

I/O data access is a recognized performance bottleneck of high-end computing. Several commercial and research parallel file systems have been developed in recent years to ease the performance bottleneck. These advanced file systems perform well on some applications but may not perform well on others. They have not reached their full potential in mitigating the I/O-wall problem. Data access is application dependent. Based on the application-specific optimization principle, in this study we propose a cost-intelligent data access strategy to improve the performance of parallel file systems. We first present a novel model to estimate data access cost of different data layout policies. Next, we extend the cost model to calculate the overall I/O cost of any given application and choose an appropriate layout policy for the application. A complex application may consist of different data access patterns. Averaging the data access patterns may not be the best solution for those complex applications that do not have a dominant pattern. We then further propose a hybrid data replication strategy for those applications, so that a file can have replications with different layout policies for the best performance. Theoretical analysis and experimental testing have been conducted to verify the newly proposed cost-intelligent layout approach. Analytical and experimental results show that the proposed cost model is effective and the application-specific data layout approach achieved up to 74% performance improvement for data-intensive applications.

[1]  Karthik Vijayakumar,et al.  Scalable I/O tracing and analysis , 2009, PDSW '09.

[2]  Xian-He Sun,et al.  Data layout optimization for petascale file systems , 2009, PDSW '09.

[3]  Robert B. Ross,et al.  Efficient structured data access in parallel file systems , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[4]  Bo Hong,et al.  File System Workload Analysis For Large Scientific Computing Applications , 2004, MSST.

[5]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[6]  Srinivasan Seshan,et al.  Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems , 2008, FAST.

[7]  Surendra Byna,et al.  Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Hui Lei,et al.  An analytical approach to file prefetching , 1997 .

[9]  Rajeev Thakur,et al.  An Extended Two-Phase Method for Accessing Sections of Out-of-Core Arrays , 1996, Sci. Program..

[10]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[11]  Alan Jay Smith,et al.  The automatic improvement of locality in storage systems , 2005, TOCS.

[12]  David Kotz,et al.  Disk-directed I/O for MIMD multiprocessors , 1994, OSDI '94.

[13]  Bin Zhou,et al.  Scalable Performance of the Panasas Parallel File System , 2008, FAST.

[14]  Song Jiang,et al.  Making resonance a common case: A high-performance implementation of collective I/O on parallel file systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15]  Joonwon Lee,et al.  An efficient lock protocol for home-based lazy release consistency , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[16]  Mendel Rosenblum,et al.  The design and implementation of a log-structured file system , 1991, SOSP '91.

[17]  Zhiwei Xu,et al.  Grid replication coherence protocol , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[18]  Margo I. Seltzer,et al.  Disk Scheduling Revisited , 1990 .

[19]  Rastislav Bodík,et al.  An efficient profile-analysis framework for data-layout optimizations , 2002, POPL '02.

[20]  Florin Isaila,et al.  Clusterfile: a flexible physical layout parallel file system , 2001, Proceedings 42nd IEEE Symposium on Foundations of Computer Science.

[21]  David R. Kaeli,et al.  Profile-guided I/O partitioning , 2003, ICS '03.

[22]  Daniel A. Reed,et al.  Automatic ARIMA time series modeling for adaptive I/O prefetching , 2004, IEEE Transactions on Parallel and Distributed Systems.

[23]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[24]  Vagelis Hristidis,et al.  BORG: Block-reORGanization for Self-optimizing Storage Systems , 2009, FAST.

[25]  Wei-keng Liao,et al.  Collective caching: application-aware client-side file caching , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[26]  Florin Isaila,et al.  Integrating collective I/O and cooperative caching into the "clusterfile" parallel file system , 2004, ICS '04.

[27]  Robert B. Ross,et al.  Noncontiguous I/O accesses through MPI-IO , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[28]  Janak H. Patel,et al.  Data prefetching in multiprocessor vector cache memories , 1991, ISCA '91.

[29]  Kang G. Shin,et al.  FS2: dynamic data replication in free disk space for improving disk performance and energy consumption , 2005, SOSP '05.

[30]  Surendra Byna,et al.  Hiding I/O latency with pre-execution prefetching for parallel applications , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[31]  Cyril U. Orji,et al.  Write-only disk caches , 1990, SIGMOD '90.

[32]  Marianne Winslett,et al.  Server-Directed Collective I/O in Panda , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[33]  Yale N. Patt,et al.  Scheduling algorithms for modern disk drives , 1994, SIGMETRICS 1994.

[34]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[35]  Gregory R. Ganger,et al.  Towards higher disk head utilization: extracting free bandwidth from busy disk drives , 2000, OSDI.

[36]  Chao Wang,et al.  Improving the availability of supercomputer job input data using temporal replication , 2009, Computer Science - Research and Development.

[37]  Robert Ross,et al.  Implementation and performance of a parallel file system for high performance distributed applications , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[38]  Song Jiang,et al.  InterferenceRemoval: removing interference of disk access for MPI programs through data replication , 2010, ICS '10.

[39]  Michel Dubois,et al.  International Conference on Parallel Processing Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors , 2006 .

[40]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[41]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[42]  Marianne Winslett,et al.  Faster collective output through active buffering , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[43]  Hai Jin,et al.  Collective Buffering: Improving Parallel I/O Performance , 2002 .

[44]  Feng Wang,et al.  File System Workload Analysis For Large Scale Scientific Com puting Applications , 2004 .