A Migration Framework for Executing Parallel Programs in the Grid

The paper describes a parallel program checkpointing mechanism and its potential application in Grid systems in order to migrate applications among Grid sites. The checkpointing mechanism can automatically (without user interaction) support generic PVM programs created by the PGRADE Grid programming environment. The developed checkpointing mechanism is general enough to be used by any Grid job manager but the current implementation is connected to Condor. As a result, the integrated Condor/PGRADE system can guarantee the execution of any PVM program in the Grid. Notice that the Condor system can only guarantee the execution of sequential jobs. Integration of the Grid migration framework and the Mercury Grid monitor results in an observable Grid execution environment where the performance monitoring and visualization of PVM applications are supported even when the PVM application migrates in the Grid.

[1]  Miron Livny,et al.  Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .

[2]  Francine Berman,et al.  Grid Computing: Making the Global Infrastructure a Reality , 2003 .

[3]  Georg Stellner,et al.  Consistent Checkpoints of PVM Applications , 1994 .

[4]  Jonathan Walpole,et al.  MPVM: A Migration Transparent Version of PVM , 1995, Comput. Syst..

[5]  Peter M. A. Sloot,et al.  DynamicPVM - Dynamic Load Balancing on Parallel Systems , 1994, HPCN.

[6]  Péter Kacsuk,et al.  SERVER BASED MIGRATION OF PARALLEL APPLICATIONS , 2002 .

[7]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[8]  Miron Livny,et al.  Condor and the Grid , 2003 .

[9]  Peter Steenkiste,et al.  Fail-Safe PVM: A Portable Package for Distributed Programming with Transparent Recovery , 1993 .

[10]  Nigel P. Topham,et al.  Performance of the decoupled ACRI-1 architecture: the perfect club , 1995, HPCN Europe.

[11]  Steven Tuecke,et al.  The Anatomy of the Grid , 2003 .

[12]  Steven Tuecke,et al.  Enabling Scalable Virtual Organizations , 2001 .

[13]  Peter Kacsuk,et al.  Visual Parallel Programming on SGI Machines , 2000 .

[14]  Kai Li,et al.  Libckpt: Transparent Checkpointing under UNIX , 1995, USENIX.

[15]  Péter Kacsuk,et al.  Grapnel to C Translation in the Grade Environment , 2001, Comput. Artif. Intell..