Automatic performance analysis with periscope

Performance analysis is essential to fully exploit the potential of high‐performance computers. With the imminence of petascale systems which will consist of ten thousands or even hundred thousands of processor cores, this task will increase in complexity. Hence, tools are required that automatically detect the performance bottlenecks and thus ease the performance analysis of an application. On large‐scale systems, collecting information about performance‐relevant events of an application can easily produce a huge amount of data whose analysis is very challenging. Aggregating the performance data during runtime and conducting the search for performance properties online allows users to distill essential performance bottlenecks without overwhelming the user with an uncontrollable load of data. In this paper we present the recent developments on Periscope, a highly scalable tool for the automatic distributed online search for the performance properties of large‐scale applications on high‐end computers. It allows for both detection of the performance bottlenecks limiting the scalability on parallel systems as well as pinpointing the issues concerning the single‐node performance of an application. Copyright © 2009 John Wiley & Sons, Ltd.

[1]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..

[2]  Michael Stumm,et al.  Online performance analysis by statistical sampling of microprocessor performance counters , 2005, ICS '05.

[3]  Barton P. Miller,et al.  On-line automated performance diagnosis on thousands of processes , 2006, PPoPP '06.

[4]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[5]  Flaviu Cristian,et al.  Probabilistic clock synchronization , 1989, Distributed Computing.

[6]  Barton P. Miller,et al.  The Paradyn Parallel Performance Measurement Tool , 1995, Computer.

[7]  H. Wenzl,et al.  Flow in Czochralski crystal growth melts , 1992 .

[8]  Jeffrey S. Vetter,et al.  Statistical scalability analysis of communication operations in distributed applications , 2001, PPoPP '01.

[9]  B.P. Miller,et al.  MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[10]  Jack J. Dongarra,et al.  A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[11]  Bernd Mohr,et al.  Performance Analysis of Large-Scale OpenMP and Hybrid MPI/OpenMP Applications with Vampir NG , 2005, IWOMP.

[12]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[13]  Allen D. Malony,et al.  Scalable, Automated Performance Analysis with TAU and PerfExplorer , 2007, PARCO.

[14]  N Ranaldo,et al.  Parallel Computing: Current & Future Issues of High-End Computing , 2006 .

[15]  Bernd Mohr,et al.  Scalable Parallel Trace-Based Performance Analysis , 2006, PVM/MPI.

[16]  Shirley Moore,et al.  Continuous Runtime Profiling of OpenMP Applications , 2007, PARCO.

[17]  Bernd Mohr,et al.  Automatic performance analysis of hybrid MPI/OpenMP applications , 2003, Eleventh Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003. Proceedings..