A system for monitoring and management of computational grids

As organizations begin to deploy large computational grids, it has become apparent that systems for observation and control of the resources, services, and applications that make up such grids are needed. Administrators must observe resources and services to ensure that they are operating correctly and must control resources and services to ensure that their operation meets the needs of users. Users are also interested in the operation of resources and services so that they can choose the most appropriate ones to use. We describe a prototype system to monitor and manage computational grids and describe the general software framework for control and observation in distributed environments that it is based on.

[1]  David S. Rosenblum,et al.  Achieving scalability and expressiveness in an Internet-scale event notification service , 2000, PODC '00.

[2]  Robert E. Filman,et al.  Managing Distributed Systems with Smart Subscriptions , 2000, PDPTA.

[3]  Warren Smith,et al.  A Simple XML Producer-Consumer Protocol , 2000 .

[4]  S. Davies,et al.  Big brother : Britain's web of surveillance and the new technological order , 1996 .

[5]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[6]  L. Smarr,et al.  Metacomputing : Siggraph'92 Showcase , 1992 .

[7]  T. Howes,et al.  LDAP: programming directory-enabled applications with lightweight directory access protocol , 1997 .

[8]  David C. Fallside,et al.  Xml schema part 0: primer , 2000 .

[9]  James C. French,et al.  Legion: The Next Logical Step Toward a Nationwide Virtual Computer , 1994 .

[10]  William E. Johnston,et al.  Grids as production computing environments: the engineering aspects of NASA's Information Power Grid , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[11]  Dennis Gannon,et al.  SoapRMI Events: Design and Implementation , 2001 .

[12]  T. Howes,et al.  Understanding and Deploying LDAP Directory Services , 2003 .

[13]  Warren Smith,et al.  Simple LDAP Schemas for Grid Monitoring , 2001 .

[14]  Warren Smith,et al.  An XML-based protocol for distributed event services , 2001 .

[15]  Warren Smith,et al.  A directory service for configuring high-performance distributed computations , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[16]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[17]  Karsten Schwan,et al.  Event services for high performance computing , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[18]  William Stallings SNMP, SNMPv2, and CMIP: the practical guide to network management , 1993 .

[19]  Warren Smith A Framework for Control and Observation in Distributed Environments , 2001 .

[20]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[21]  Jason Lee,et al.  A Monitoring Sensor Management System for Grid Environments , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[22]  Warren Smith,et al.  A Resource Management Architecture for Metacomputing Systems , 1998, JSSPP.

[23]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[24]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[25]  Ian T. Foster,et al.  A security architecture for computational grids , 1998, CCS '98.

[26]  Richard Wolski,et al.  Forecasting network performance to support dynamic scheduling using the network weather service , 1997, Proceedings. The Sixth IEEE International Symposium on High Performance Distributed Computing (Cat. No.97TB100183).

[27]  Miron Livny,et al.  Experience with the Condor distributed batch system , 1990, IEEE Workshop on Experimental Distributed Systems.