High-Performance Computing (HPC) applications have historically executed in static resource allocations, using programming models that ran independently from the resident system management stack (SMS). Achieving exascale performance that is both cost-effective and fits within site-level environmental constraints will, however, require that the application and SMS collaboratively orchestrate the flow of work to optimize resource utilization and compensate for on-the-fly faults. The Process Management Interface - Exascale (PMIx) community is committed to establishing scalable workflow orchestration by defining an abstract set of interfaces by which not only applications and tools can interact with the resident SMS, but also the various SMS components can interact with each other. This paper presents a high-level overview of the goals and current state of the PMIx standard, and lays out a roadmap for future directions.
[1]
Bronis R. de Supinski,et al.
Massively parallel loading
,
2013,
ICS '13.
[2]
Thomas Hérault,et al.
Post-failure recovery of MPI communication capability
,
2013,
Int. J. High Perform. Comput. Appl..
[3]
Brian W. Barrett,et al.
The Open Run-Time Environment (OpenRTE): A Transparent Multi-cluster Environment for High-Performance Computing
,
2005,
PVM/MPI.
[4]
Hadi Sharifi,et al.
Monitoring HPC applications in the production environment
,
2015
.