Recovering logical structure from Charm++ event traces

Asynchrony and non-determinism in Charm++ programs present a significant challenge in analyzing their event traces. We present a new framework to organize event traces of parallel programs written in Charm++. Our reorganization allows one to more easily explore and analyze such traces by providing context through logical structure. We describe several heuristics to compensate for missing dependencies between events that currently cannot be easily recorded. We introduce a new task ordering that recovers logical structure from the non-deterministic execution order. Using the logical structure, we define several metrics to help guide developers to performance problems. We demonstrate our approach through two proxy applications written in Charm++. Finally, we discuss the applicability of this framework to other task-based runtimes and provide guidelines for tracing to support this form of analysis.

[1]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[2]  Felix Wolf,et al.  Timestamp Synchronization for Event Traces of Large-Scale Message-Passing Applications , 2007, PVM/MPI.

[3]  William Gropp,et al.  Toward Scalable Performance Visualization with Jumpshot , 1999, Int. J. High Perform. Comput. Appl..

[4]  Chee Wai Lee,et al.  Techniques in scalable and effective parallel performance analysis , 2009 .

[5]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[6]  Douglas Thain,et al.  CONCURRENCY AND COMPUTATION : PRACTICE AND EXPERIENCE Concurrency Computat , 2009 .

[7]  Rolf Rabenseifner The controlled logical clock--a global time for trace-based software monitoring of parallel applications in workstation clusters , 1997, PDP.

[8]  Bernd Hamann,et al.  Combing the Communication Hairball: Visualizing Large-Scale Parallel Execution Traces using Logical Time , 2015 .

[9]  Laxmikant V. Kale,et al.  Parallel Science and Engineering Applications - The Charm++ Approach , 2013, Parallel Science and Engineering Applications.

[10]  Michael Kaufmann,et al.  Visualization Aided Performance Tuning of Irregular Task-Parallel Computations , 2006, Inf. Vis..

[11]  Thomas J. Leblanc,et al.  Analyzing Parallel Program Executions Using Multiple Views , 1990, J. Parallel Distributed Comput..

[12]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .

[13]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000, Softw. Pract. Exp..

[14]  Valerio Pascucci,et al.  In-Situ Feature Extraction of Large Scale Combustion Simulations Using Segmented Merge Trees , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[15]  Alexander Aiken,et al.  Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Wolfgang E. Nagel,et al.  VAMPIR: Visualization and Analysis of MPI Resources , 2010 .

[17]  Dirk Schmidl,et al.  Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir , 2011, Parallel Tools Workshop.

[18]  Laxmikant V. Kalé,et al.  Scaling applications to massively parallel machines using Projections performance analysis tool , 2006, Future Gener. Comput. Syst..

[19]  Laxmikant V. Kalé,et al.  Towards scalable performance analysis and visualization through data reduction , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[20]  Bernd Hamann,et al.  Ordering Traces Logically to Identify Lateness in Message Passing Programs , 2016, IEEE Transactions on Parallel and Distributed Systems.

[21]  Alejandro Duran,et al.  Productive Cluster Programming with OmpSs , 2011, Euro-Par.

[22]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[23]  Bernd Hamann,et al.  Combing the Communication Hairball: Visualizing Parallel Execution Traces using Logical Time , 2014, IEEE Transactions on Visualization and Computer Graphics.

[24]  Jacques Chassin de Kergommeaux,et al.  Pajé, an interactive visualization tool for tuning multi-threaded parallel applications , 2000, Parallel Comput..

[25]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010, Concurr. Comput. Pract. Exp..

[26]  Dieter Kranzlmüller,et al.  Event-based Program Analysis with DeWiz , 2003, ArXiv.