Automatic On-Line Detection of MPI Application Structure with Event Flow Graphs

The deployment of larger and larger HPC systems challenges the scalability of both applications and analysis tools. Performance analysis toolsets provide users with means to spot bottlenecks in their applications by either collecting aggregated statistics or generating lossless time-stamped traces. While obtaining detailed trace information is the best method to examine the behavior of an application in detail, it is infeasible at extreme scales due to the huge volume of data generated.

[1]  Juan Gonzalez,et al.  On-line detection of large-scale parallel application's structure , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[2]  Jesús Labarta,et al.  Automatic Phase Detection and Structure Extraction of MPI Applications , 2010, Int. J. High Perform. Comput. Appl..

[3]  T. Hahm,et al.  Turbulent transport reduction by zonal flows: massively parallel simulations , 1998, Science.

[4]  Martin Schulz,et al.  Large scale debugging of parallel tasks with AutomaDeD , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  Abdelwahab Hamou-Lhadj,et al.  Identifying computational phases from inter-process communication traces of HPC applications , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[6]  Des Watson,et al.  A study of irreducibility in C programs , 2012, Softw. Pract. Exp..

[7]  Ken Kennedy,et al.  An Implementation of Interprocedural Bounded Regular Section Analysis , 1991, IEEE Trans. Parallel Distributed Syst..

[8]  David Skinner,et al.  Capturing and Visualizing Event Flow Graphs of MPI Applications , 2009, Euro-Par Workshops.

[9]  Paul Havlak,et al.  Nesting of reducible and irreducible loops , 1997, TOPL.

[10]  Isao Kojima,et al.  Applying Selectively Parallel I/O Compression to Parallel Storage Systems , 2014, Euro-Par.

[11]  Martin Schulz,et al.  Detecting Patterns in MPI Communication Traces , 2008, 2008 37th International Conference on Parallel Processing.

[12]  Yu Chen,et al.  A New Algorithm for Identifying Loops in Decompilation , 2007, SAS.

[13]  G. Ramalingam,et al.  Identifying loops in almost linear time , 1999, TOPL.

[14]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[15]  Erwin Laure,et al.  MPI Trace Compression Using Event Flow Graphs , 2014, Euro-Par.

[16]  Robert E. Tarjan Testing flow graph reducibility , 1973, STOC '73.

[17]  Juan Gonzalez,et al.  Automatic Refinement of Parallel Applications Structure Detection , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[18]  Martin Schulz,et al.  ScalaTrace: Scalable compression and replay of communication traces for high-performance computing , 2008, J. Parallel Distributed Comput..

[19]  Juan Gonzalez,et al.  Automatic detection of parallel applications computation phases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[20]  Guang R. Gao,et al.  Identifying loops using DJ graphs , 1996, TOPL.

[21]  Martin Schulz,et al.  AutomaDeD: Automata-based debugging for dissimilar parallel tasks , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[22]  Erwin Laure,et al.  Visual MPI Performance Analysis using Event Flow Graphs , 2015, ICCS.

[23]  Jesús Labarta,et al.  Automatic Structure Extraction from MPI Applications Tracefiles , 2007, Euro-Par.