Parallel I/O performance: From events to ensembles

Parallel I/O is fast becoming a bottleneck to the research agendas of many users of extreme scale parallel computers. The principle cause of this is the concurrency explosion of high-end computation, coupled with the complexity of providing parallel file systems that perform reliably at such scales. More than just being a bottleneck, parallel I/O performance at scale is notoriously variable, being influenced by numerous factors inside and outside the application, thus making it extremely difficult to isolate cause and effect for performance events. In this paper, we propose a statistical approach to understanding I/O performance that moves from the analysis of performance events to the exploration of performance ensembles. Using this methodology, we examine two I/O-intensive scientific computations from cosmology and climate science, and demonstrate that our approach can identify application and middleware performance deficiencies — resulting in more than 4× run time improvement for both examined applications.

[1]  A. Malony,et al.  Observing Parallel Phase and I/O Performance Using TAU , 2008, 2008 DoD HPCMP Users Group Conference.

[2]  Luiz De Rose,et al.  Detecting Application Load Imbalance on High End Massively Parallel Systems , 2007, Euro-Par.

[3]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..

[4]  John Shalf,et al.  Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Andrew A. Chien,et al.  I/O requirements of scientific applications: an evolutionary view , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[6]  Tyce T. McLarty,et al.  Parallel file system testing for the lunatic fringe: the care and feeding of restless I/O power users , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[7]  Jeffrey S. Vetter,et al.  Scalable Analysis Techniques for Microprocessor Performance Counter Metrics , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[8]  Leonid Oliker,et al.  HPC global file system performance analysis using a scientific-application derived benchmark , 2009, Parallel Comput..

[9]  Julian Borrill MADCAP - The Microwave Anisotropy Dataset Computational Analysis Package , 1999 .

[10]  Nicholas J. Wright,et al.  Characterizing Parallel Scaling of Scientific Applications using IPM , 2009 .

[11]  Leonid Oliker,et al.  Identifying performance bottlenecks on modern microarchitectures using an adaptable probe , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[12]  William Gropp,et al.  An efficient format for nearly constant-time access to arbitrary time intervals in large trace files , 2008, Sci. Program..