Uniform job monitoring in the HPC-Europa project: data model, API and services

Job monitoring in Grid systems presents an important challenge because Grid environments are volatile, heterogeneous, not reliable and are managed by different middleware and monitoring tools. We present the infrastructure that we have designed and implemented in the HPC-Europa European project, which allows uniform access to job-monitoring information from different Virtual Organisations (VOs). The presented system introduces the user to the complexities of the underlying systems of each middleware. The API that each centre has to implement for providing access to its job-monitoring information is explained. Finally, we show all the features that a user can use in the portal to personalise his/her monitoring environment, i.e., to choose how and which information has to be presented.

[1]  C. Deegan,et al.  The Materiality of Environmental Information to Users of Annual Reports , 1997 .

[2]  Jesús Labarta,et al.  eNANOS: Coordinated Scheduling in Grid Environments , 2005, PARCO.

[3]  Marian Bubak,et al.  The CrossGrid Performance Analysis Tool for Interactive Grid Applications , 2002, PVM/MPI.

[4]  Jesús Labarta,et al.  eNANOS Grid Resource Broker , 2005, EGC.

[5]  Eduardo Huedo,et al.  A framework for adaptive execution in grids , 2004, Softw. Pract. Exp..

[6]  David F. Snelling,et al.  UNICORE—a Grid computing environment , 2002, Concurr. Comput. Pract. Exp..

[7]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[8]  Abbe Mowshowitz,et al.  Virtual organization , 1997, CACM.

[9]  John Shalf,et al.  Enabling Applications on the Grid: A Gridlab Overview , 2003, Int. J. High Perform. Comput. Appl..

[10]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[11]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[12]  Vanish Talwar,et al.  An environment for enabling interactive grids , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[13]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .

[14]  Y. Wang,et al.  GRMS: a global resource management system for distributed QoS and criticality support , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[15]  Bartosz Balis,et al.  Performance Evaluation and Monitoring of Interactive Grid Applications , 2004, PVM/MPI.

[16]  D.A. Reed,et al.  Scalable performance analysis: the Pablo performance analysis environment , 1993, Proceedings of Scalable Parallel Libraries Conference.

[17]  Paul Graham,et al.  HPC-Europa: towards uniform access to European HPC infrastructures , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..