Enabling event tracing at leadership-class scale through I/O forwarding middleware

Event tracing is an important tool for understanding the performance of parallel applications. As concurrency increases in leadership-class computing systems, the quantity of performance log data can overload the parallel file system, perturbing the application being observed. In this work we present a solution for event tracing at leadership scales. We enhance the I/O forwarding system software to aggregate and reorganize log data prior to writing to the storage system, significantly reducing the burden on the underlying file system for this type of traffic. Furthermore, we augment the I/O forwarding system with a write buffering capability to limit the impact of artificial perturbations from log data accesses on traced applications. To validate the approach, we modify the Vampir tracing toolset to take advantage of this new capability and show that the approach increases the maximum traced application size by a factor of 5x to more than 200,000 processes.

[1]  Kamil Iskra,et al.  Performance and Scalability Evaluation of ‘Big Memory’ on Blue Gene Linux , 2011, Int. J. High Perform. Comput. Appl..

[2]  Felix Wolf,et al.  Scalable massively parallel I/O to task-local files , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[3]  Galen M. Shipman,et al.  The Spider Center Wide File System; From Concept to Reality , 2009 .

[4]  Scott Klasky,et al.  Terascale direct numerical simulations of turbulent combustion using S3D , 2008 .

[5]  Robert B. Ross,et al.  BMI: a network abstraction layer for parallel I/O , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[6]  Michael M. Resch,et al.  Tools for High Performance Computing - Proceedings of the 2nd International Workshop on Parallel Tools for High Performance Computing, July 2008, HLRS, Stuttgart , 2008, Parallel Tools Workshop.

[7]  Terry Jones,et al.  Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[8]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[9]  Matthias S. Müller,et al.  Developing Scalable Applications with Vampir, VampirServer and VampirTrace , 2007, PARCO.

[10]  Matthias S. Müller,et al.  The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.

[11]  Kwan-Liu Ma,et al.  Visual Analysis of Inter-Process Communication for Large-Scale Parallel Computing , 2009, IEEE Transactions on Visualization and Computer Graphics.

[12]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[13]  Kwan-Liu Ma,et al.  Visual analysis of I/O system behavior for high-end computing , 2011, LSAP '11.

[14]  Kevin T. Pedretti,et al.  Cplant/sup /spl trade// runtime system support for multi-processor and heterogeneous compute nodes , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[15]  Kamil Iskra,et al.  ZOID: I/O-forwarding infrastructure for petascale architectures , 2008, PPoPP.

[16]  Francois Gygi,et al.  Practical algorithms to facilitate large-scale first-principles molecular dynamics , 2009 .

[17]  Galen M. Shipman,et al.  Jaguar: The World?s Most Powerful Computer , 2009 .

[18]  GhemawatSanjay,et al.  The Google file system , 2003 .

[19]  J. Fier,et al.  Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[20]  Robert B. Ross,et al.  Accelerating I/O Forwarding in IBM Blue Gene/P Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Wolfgang E. Nagel,et al.  Introducing the Open Trace Format (OTF) , 2006, International Conference on Computational Science.

[22]  Robert B. Ross,et al.  Bridging HPC and Grid File I/O with IOFSL , 2010, PARA.

[23]  Robert B. Ross,et al.  A configurable algorithm for parallel image-compositing applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[24]  Robert Latham,et al.  High performance file I/O for the Blue Gene/L supercomputer , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[25]  John W. Romein FCNP: Fast I/O on the Blue Gene/P , 2009, PDPTA.

[26]  Karsten Schwan,et al.  DataStager: scalable data staging services for petascale applications , 2009, HPDC '09.

[27]  Wei-keng Liao,et al.  Scaling parallel I/O performance through I/O delegate and caching system , 2008, HiPC 2008.

[28]  Bernd Mohr,et al.  Large-Scale Performance Analysis of Sweep3D with the Scalasca Toolset , 2010, Parallel Process. Lett..

[29]  Sadaf R. Alam,et al.  A Holistic Approach for Performance Measurement and Analysis for Petascale Applications , 2009, ICCS.

[30]  Wei-keng Liao,et al.  Noncontiguous access through MPI-IO , 2003 .

[31]  Scott Klasky,et al.  DART: a substrate for high speed asynchronous data IO , 2008, HPDC '08.

[32]  John Shalf,et al.  The Petascale Data Storage Institute , 2013 .

[33]  Peter Honeyman,et al.  Exporting storage systems in a scalable manner with pNFS , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[34]  Robert Latham,et al.  Scalable I/O forwarding framework for high-performance computing systems , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[35]  Alex Rapaport,et al.  Mpi-2: extensions to the message-passing interface , 1997 .

[36]  Robert B. Ross,et al.  Optimization Techniques at the I/O Forwarding Layer , 2010, 2010 IEEE International Conference on Cluster Computing.