Faucets: efficient resource allocation on the computational grid

The idea of a "computational grid" suggests that high end computational power can be thought of as a utility, similar to electricity or water. Making this metaphor work requires a sophisticated "power distribution" infrastructure. We present the Faucets framework that aims at providing (a) user-friendly compute power distribution across the grid, (b) market-driven selection of compute servers for each job, resulting in effective utilization of resources across the grid, and (c) improved utilization within individual compute servers. Utilization of individual compute servers is improved by the notions of adaptive jobs and smarter job schedulers. Server selection is facilitated by quality-of-service (QoS) contracts for parallel jobs. Market efficiencies are then attained by a bidding and evaluation system that makes the compute servers compete for every job by submitting bids, thus transforming the computational grid into a free market. Job submission and monitoring is simplified by several tools and databases within the Faucets system. We describe the overall architecture of the system. All the essential components of the system have been implemented, which are described In the work. We also discuss ongoing work and future research issues.

[1]  David E. Culler,et al.  REXEC: A Decentralized, Secure Remote Execution Environment for Clusters , 2000, CANPC.

[2]  José E. Moreira,et al.  Dynamic resource management on distributed systems using reconfigurable applications , 1997, IBM J. Res. Dev..

[3]  Ian T. Foster,et al.  Data management and transfer in high-performance computational grid environments , 2002, Parallel Comput..

[4]  Laxmikant V. Kale,et al.  Object-Based Adaptive Load Balancing for MPI Programs∗ , 2000 .

[5]  Richard D. Tabors,et al.  Uniform Pricing or Pay-as-Bid Pricing: A Dilemma for California and Beyond , 2001 .

[6]  Laxmikant V. Kalé,et al.  A Parallel Framework for Explicit FEM , 2000, HiPC.

[7]  Jong Kim,et al.  On-line scheduling of scalable real-time tasks on multiprocessor systems , 2003, J. Parallel Distributed Comput..

[8]  Laxmikant V. Kalé,et al.  Avoiding Algorithmic Obfuscation in a Message-Driven Parallel MD Code , 1999, Computational Molecular Dynamics.

[9]  Amin Vahdat,et al.  Managing energy and server resources in hosting centers , 2001, SOSP.

[10]  Tad Hogg,et al.  Spawn: A Distributed Computational Economy , 1992, IEEE Trans. Software Eng..

[11]  David Abramson,et al.  Economic models for resource management and scheduling in Grid computing , 2002, Concurr. Comput. Pract. Exp..

[12]  Laxmikant V. Kalé,et al.  Load Balancing in Parallel Molecular Dynamics , 1998, IRREGULAR.

[13]  Laxmikant V. Kale,et al.  NAMD2: Greater Scalability for Parallel Molecular Dynamics , 1999 .

[14]  W. Knauss,et al.  Crack Propagation at and Near Bimaterial Interfaces: Linear Analysis , 1994 .

[15]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[16]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[17]  David Abramson,et al.  Nimrod/G: an architecture for a resource management and scheduling system in a global computational grid , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[18]  Richard Wolski,et al.  Analyzing Market-Based Resource Allocation Strategies for the Computational Grid , 2001, Int. J. High Perform. Comput. Appl..

[19]  Laxmikant V. Kalé,et al.  A New Approach to Software Integration Frameworks for Multi-physics Simulation Codes , 2000, The Architecture of Scientific Software.

[20]  Henri Casanova,et al.  Adaptive Scheduling for Task Farming with Grid Middleware , 1999, Int. J. High Perform. Comput. Appl..

[21]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[22]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[23]  Jitendra Padhye,et al.  Dynamic versus Adaptive Processor Allocation Policies for Message Passing Parallel Computers: An Empirical Comparison , 1996, JSSPP.

[24]  Larry Rudolph,et al.  Towards Convergence in Job Schedulers for Parallel Supercomputers , 1996, JSSPP.

[25]  Bernard Caillaud,et al.  Implementing the Optimal Auction , 2003 .

[26]  Michael Stonebraker,et al.  An economic paradigm for query processing and data migration in Mariposa , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[27]  Laxmikant V. Kalé,et al.  NAMD: a Parallel, Object-Oriented Molecular Dynamics Program , 1996, Int. J. High Perform. Comput. Appl..

[28]  Amin Vahdat,et al.  Opus: an overlay peer utility service , 2002, 2002 IEEE Open Architectures and Network Programming Proceedings. OPENARCH 2002 (Cat. No.02EX571).

[29]  Laxmikant V. Kalé,et al.  A Malleable-Job System for Timeshared Parallel Machines , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[30]  Laxmikant V. Kalé,et al.  Adaptive Load Balancing for MPI Programs , 2001, International Conference on Computational Science.

[31]  Dror G. Feitelson,et al.  Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling , 2001, IEEE Trans. Parallel Distributed Syst..

[32]  Luis F. G. Sarmenta,et al.  Bayanihan Computing .NET: Grid Computing with XML Web Services , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[33]  Robert J. Harrison,et al.  Global Arrays: a portable "shared-memory" programming model for distributed memory computers , 1994, Proceedings of Supercomputing '94.

[34]  Ian T. Foster,et al.  A security architecture for computational grids , 1998, CCS '98.

[35]  Klaus Jansen,et al.  Linear-Time Approximation Schemes for Scheduling Malleable Parallel Tasks , 1999, SODA '99.

[36]  Thu D. Nguyen,et al.  Using Runtime Measured Workload Characteristics in Parallel Processor Scheduling , 1996, JSSPP.

[37]  Luis F. G. Sarmenta,et al.  Bayanihan: Web-Based Volunteer Computing Using Java , 1998, WWCA.

[38]  Francine Berman,et al.  Application Scheduling on the Information Power Grid , 2000 .

[39]  Anoop Gupta,et al.  Process control and scheduling issues for multiprogrammed shared-memory multiprocessors , 1989, SOSP '89.

[40]  Laxmikant V. Kalé,et al.  NAMD: A Case Study in Multilingual Parallel Programming , 1997, LCPC.

[41]  Andrew S. Grimshaw,et al.  Legion-a view from 50,000 feet , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[42]  Klaus Schulten,et al.  A system for interactive molecular dynamics simulation , 2001, I3D '01.

[43]  Miron Livny,et al.  Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System , 1997 .

[44]  Margaret Martonosi,et al.  Adaptive parallelism in compiler‐parallelized code , 1998 .

[45]  Ian T. Foster,et al.  GASS: a data movement and access service for wide area computing systems , 1999, IOPADS '99.

[46]  Laxmikant V. Kalé,et al.  Scalable Molecular Dynamics for Large Biomolecular Systems , 2000, ACM/IEEE SC 2000 Conference (SC'00).