A Scalable Process-Management Environment for Parallel Programs

We present a process management system for parallel programs such as those written using MPI. A primary goal of the system, which we call MPD (for multipurpose daemon), is to be scalable. By this we mean that startup of interactive parallel jobs comprising a thousand processes is quick, that signals can be quickly delivered to processes, and that stdin, stdout, and stderr are managed intuitively. Our primary target is parallel machines made up of clusters of SMPs, but the system is also useful in more tightly integrated environments. We describe how MPD enables much faster startup and better runtime management of MPICH jobs. We show how close control of stdio can support the easy implementation of a number of convenient system utilities, even a parallel debugger. MPD is implemented and freely distributed with MPICH.

[1]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[2]  Amnon Barak,et al.  The MOSIX Distributed Operating System , 1993, Lecture Notes in Computer Science.

[3]  Amnon Barak,et al.  The MOSIX Distributed Operating System: Load Balancing for UNIX , 1993 .

[4]  Ewing L. Lusk,et al.  Monitors, Messages, and Clusters: The p4 Parallel Programming System , 1994, Parallel Comput..

[5]  William Gropp,et al.  Scalable Unix tools on parallel processors , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[6]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[7]  Miron Livny,et al.  Interfacing Condor and PVM to harness the cycles of workstation clusters , 1996, Future Gener. Comput. Syst..

[8]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[9]  Amin Vahdat,et al.  GLUix: a global layer unix for a network of workstations , 1998 .

[10]  Amin Vahdat,et al.  GLUix: a global layer unix for a network of workstations , 1998, Softw. Pract. Exp..

[11]  James Arthur Kohl,et al.  HARNESS: a next generation distributed virtual machine , 1999, Future Gener. Comput. Syst..

[12]  Vaidy S. Sunderam,et al.  PVM Emulation in the Harness Metacomputing System: A Plug-in Based Approach , 1999, PVM/MPI.

[13]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[14]  Greg Burns,et al.  LAM: An Open Cluster Environment for MPI , 2002 .