Support for extensibility and site autonomy in the Legion grid system object model

Grid computing is the use of large collections of heterogeneous, distributed resources (including machines, databases, devices, and users) to support large-scale computations and wide-area data access. The Legion system is an implementation of a software architecture for grid computing. The basic philosophy underlying this architecture is the presentation of all grid resources as components of a single, seamless, virtual machine. Legion's architecture was designed to address the challenges of using and managing wide-area resources. Features of the architecture include: global, shared namespaces; support for heterogeneity; security; wide-area data sharing; wide-area parallel processing; application-adjustable fault tolerance; efficient scheduling and comprehensive resource management. We present the core design of the Legion architecture, with focus on the critical issues of extensibility and site autonomy. Grid systems software must be extensible because no static set of system-level decisions can meet all of the diverse, often conflicting, requirements of present and future user communities, nor take best advantage of unanticipated future hardware advances. Grid systems software must also support complete site autonomy, as resource owners will not turn control of their resources over to a dictatorial system.

[1]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[2]  Andrew S. Grimshaw,et al.  A new model of security for metasystems , 1999, Future Gener. Comput. Syst..

[3]  Andrew S. Grimshaw,et al.  Wide-Area Computing: Resource Sharing on a Large Scale , 1999, Computer.

[4]  Nian-Feng Tzeng,et al.  Distributing Hot-Spot Addressing in Large-Scale Multiprocessors , 1987, IEEE Transactions on Computers.

[5]  Curtis E. A. Karnow,et al.  The Grid: Blueprint for a New Computing Infrastructure ed. by Ian Foster and Carl Kesselman (review) , 2017 .

[6]  Andrew S. Grimshaw,et al.  Exploiting Data-Flow for Fault-Tolerance in a Wide-Area Parallel System , 1996, SRDS.

[7]  Premkumar T. Devanbu,et al.  Resource Management , 2000, EDO.

[8]  Reagan Moore,et al.  Collection-based persistent archives , 1999, 16th IEEE Symposium on Mass Storage Systems in cooperation with the 7th NASA Goddard Conference on Mass Storage Systems and Technologies (Cat. No.99CB37098).

[9]  Andrew S. Grimshaw,et al.  A Flexible Security System for Metacomputing Environments , 1999, HPCN Europe.

[10]  FerrariAdam,et al.  Wide-Area Computing , 1999 .

[11]  Andrew S. Grimshaw,et al.  Grid-based file access: the Legion I/O model , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[12]  Andrew S. Grimshaw,et al.  Portable run-time support for dynamic object-oriented parallel processing , 1996, TOCS.

[13]  Andrew S. Grimshaw,et al.  The Legion vision of a worldwide virtual computer , 1997, Commun. ACM.

[14]  Nancy Wilkins-Diehr,et al.  Studying protein folding on the Grid: experiences using CHARMM on NPACI resources under Legion , 2004, Concurr. Comput. Pract. Exp..

[15]  M. van Steen,et al.  The Architectural Design of Globe: A Wide-Area Distributed System , 1997 .

[16]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[17]  Gregory V. Wilson,et al.  Parallel Programming Using C , 1996 .

[18]  Nancy Wilkins-Diehr,et al.  Studying protein folding on the grid: experiences using CHARMM on NPACI resources under Legion , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[19]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[20]  James C. French,et al.  Extensible file system (ELFS): an object-oriented approach to high performance file I/O , 1994, OOPSLA '94.

[21]  Andrew S. Grimshaw,et al.  The core Legion object model , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[22]  Andrew S. Grimshaw,et al.  Enabling Flexibility in the Legion Run-Time Library , 1997, PDPTA.

[23]  Andrew S. Grimshaw,et al.  Implementation of the Legion Library , 1996 .

[24]  Massachusett Framingham,et al.  The Common Object Request Broker: Architecture and Specification Version 3 , 2003 .

[25]  Jr. Harold W. Lockhart OSF DCE: guide to developing distributed applications , 1994 .

[26]  Andrew S. Grimshaw,et al.  Capacity and Capability Computing Using Legion , 2001, International Conference on Computational Science.

[27]  John F. Karpovich Support for Object Placement in Wide-Area Heterogeneous Distributed Systems , 1996 .

[28]  Andrew S. Grimshaw,et al.  Accountability and Control of Process Creation in Metasystems , 2000, NDSS.

[29]  John F. Karpovich,et al.  Resource management in Legion , 1999, Future Gener. Comput. Syst..

[30]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[31]  Andrew S. Grimshaw,et al.  Integrating fault-tolerance techniques in grid applications , 2000 .

[32]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[33]  James C. French,et al.  Extensible File Systems (ELFS): An Object-Oriented Approach to High Performance File I/O , 1994, OOPSLA.

[34]  A. Watson,et al.  OMG (Object Management Group) architecture and CORBA (common object request broker architecture) specification , 2002 .

[35]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .