A Scalable Tools Communication Infrastructure

The Scalable Tools Communication Infrastructure (STCI) is an open source collaborative effort intended to provide high-performance, scalable, resilient, and portable communications and process control services for a wide variety of user and system tools. STCI is aimed specifically at tools for ultrascale computing and uses a component architecture to simplify tailoring the infrastructure to a wide range of scenarios. This paper describes STCI’s design philosophy, the various components that will be used to provide an STCI implementation for a range of ultrascale platforms, and a range of tool types. These include tools supporting parallel run-time environments, such as MPI, parallel application correctness tools and performance analysis tools, as well as system monitoring and management tools.

[1]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[2]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[3]  Marc Snir,et al.  The Communication Software and Parallel Environment of the IBM SP2 , 1995, IBM Syst. J..

[4]  Jack J. Dongarra,et al.  A Portable Programming Interface for Performance Evaluation on Modern Processors , 2000, Int. J. High Perform. Comput. Appl..

[5]  B.P. Miller,et al.  MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[6]  Robert J. Fowler,et al.  HPCToolkit : Multi-platform Tools for Profile-based Performance Analysis , 2003 .

[7]  Ronald Minnich,et al.  A Network-Failure-Tolerant Message-Passing System for Terascale Clusters , 2002, ICS '02.

[8]  M R Collette,et al.  High Performance Tools And Technologies , 2005 .

[9]  Jeffrey M. Squyres,et al.  The Component Architecture of Open MPI: Enabling Third-Party Collective Algorithms* , 2005 .

[10]  Brian W. Barrett,et al.  The Open Run-Time Environment (OpenRTE): A Transparent Multi-cluster Environment for High-Performance Computing , 2005, PVM/MPI.

[11]  Nathan DeBardeleben,et al.  A Model-Based Framework for the Integration of Parallel Tools , 2006, 2006 IEEE International Conference on Cluster Computing.

[12]  Allen D. Malony,et al.  The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..