Integration of the LAM / MPI environment and the PBS scheduling system

The growth of cluster computing as a viable option for high pe rformance computing has lead to the development of a compreh ensive software stack for these machines, including cluster sched uling, parallel environments, and scientific libraries. Op enPBS or PBS/Pro is often used for scheduling, with LAM/MPI or MPICH used for parallel communication. This paper details the integration of the PB S scheduling and resource managing infrastructure with the LAM/MPI paralle l run-time environment. The integration provides a cluster with several features that, although commonly available on traditional supercom puters, have been conspicuously missing in cluster computi ng environments: fast job startup, proper resource cleanup, and detailed account ing.