An approach to grid resource selection and fault management based on ECA rules

In grid computing, resource management and fault tolerance services are important issues. Because the numbers of the application tasks and amounts of required resources are enormous and quick responses to the requirements of users are necessary in the real grid environment, real-time resource co-allocation may be large-scale. This paper proposes an Active Grid Information Server (AGIS) that is a resource manager for optimal resource selection and fault tolerant service using a database management system that supports event-condition-action (ECA) rules. Our resource manager automatically selects the set of optimal resources among idle resources that achieves optimal performance while turnaround time is chosen as metric for performance evaluation. Typically, the probability of a failure is higher in grid computing than in traditional parallel computing and the failure of resources affects job execution fatally. Therefore, a fault tolerance service is essential in computational grids. Grid services are often expected to meet some minimum levels of Quality of Service (QoS) for a desirable operation. To address this issue, we also propose a fault tolerance service that satisfies QoS requirements. The fault tolerance requires timely notification of changes, raising the need for mechanisms for monitoring and processing such changes. Event-condition-action (ECA) rules are a natural candidate to fulfill this need. We develop conservative tests for determining the termination and confluence of sets of ECA rules. We argue that the employment of ECA rules, both for resource selection and fault tolerance, leads to efficiency and to additional techniques. Furthermore, the proposed AGIS system architecture offers a number of advantages owing to the performance and scalability that can be achieved using active databases. Our preliminary performance results indicate that the ECA rule-based approach for resource matching is efficient in speed and accuracy and can keep up with high job-arrival rates - an important criterion for online resource matching systems. We describe Grid-JQA, an architecture supporting such rules in grid environments, and our current implementation of this architecture. Three heuristic approaches have been designed and compared via simulations to match tasks which take into account the QoS requested by the tasks, and at the same time, to minimize the tasks makespan as much as possible. Also, an optimum method based on the performance metric has been designed to compare the performance of the heuristics developed. Our proposed solution has at least a 45% improvement over the general method which uses a first come, first served (FCFS) strategy. The implementation and simulation results indicate that our approaches are promising in that the resource manager finds the optimal set of resources to guarantee efficient job execution, the fault manager guarantees that the submitted jobs are completed, and job execution is improved owing to job duplication even if some failures occur.

[1]  M. Papakhian Comparing job-management systems: the user's perspective , 1998 .

[2]  Mohammad-Reza Tazari A Context-Oriented RDF Database , 2003, SWDB.

[3]  Agostino G. Bruzzone,et al.  1999 International Conference on Web-Based Modeling and Simulation , 1999 .

[4]  Norman W. Paton,et al.  Active Rules in Database Systems , 1998, Monographs in Computer Science.

[5]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[6]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[7]  Rajkumar Buyya,et al.  GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for Grid computing , 2002, Concurr. Comput. Pract. Exp..

[8]  Pascale Vicat-Blanc-Primet,et al.  Grid high performance networking in the DataGRID project , 2003 .

[9]  Morteza Analoui,et al.  Grid-JQA a new architecture for QoS-guaranteed grid computing system , 2006, 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP'06).

[10]  Stefano Paraboschi,et al.  Data-Driven, One-To-One Web Site Generation for Data-Intensive Applications , 1999, VLDB.

[11]  Laurent Lefèvre,et al.  Designing and evaluating an active grid architecture , 2005, Future Gener. Comput. Syst..

[12]  Pascale Vicat-Blanc Primet Grid high performance networking in the DataGRID project , 2003, Future Gener. Comput. Syst..

[13]  Warren Smith,et al.  An Infrastructure for Monitoring and Management in Computational Grids , 2000, LCR.

[14]  Morteza Analoui,et al.  QoS-based scheduling of workflow applications on grids , 2007 .

[15]  Ian Foster,et al.  A quality of service architecture that combines resource reservation and application adaptation , 2000, 2000 Eighth International Workshop on Quality of Service. IWQoS 2000 (Cat. No.00EX400).

[16]  Michael Luck,et al.  Transparent Fault Tolerance for Web Services Based Architectures , 2002, Euro-Par.

[17]  Gregor von Laszewski,et al.  A fault detection service for wide area distributed computations , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[18]  Stefano Paraboschi,et al.  Active rules for XML: A new paradigm for E-services , 2001, The VLDB Journal.

[19]  Opher Etzion,et al.  Push Technology Personalization through Event Correlation , 2000, VLDB.

[20]  Sathish S. Vadhiyar,et al.  A performance oriented migration framework for the grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[21]  David Abramson,et al.  Nimrod/G: an architecture for a resource management and scheduling system in a global computational grid , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[22]  R. V. van Nieuwpoort,et al.  The Grid 2: Blueprint for a New Computing Infrastructure , 2003 .

[23]  Felix C. Freiling,et al.  DREAM: Distributed Reliable Event-Based Application Management , 2004, Web Dynamics.

[24]  Bettina Schnor,et al.  Migol: A Fault-Tolerant Service Framework for MPI Applications in the Grid , 2005, PVM/MPI.

[25]  Morteza Analoui,et al.  Grid-JQA: grid Java based quality of service management by active database , 2006, ACSW.

[26]  Andrew S. Grimshaw,et al.  Legion-a view from 50,000 feet , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[27]  Jianxun Liu,et al.  Dynamic batch processing in workflows: Model and implementation , 2007, Future Gener. Comput. Syst..

[28]  David Abramson,et al.  Grid Resource Management, Scheduling and Computational Economy , 2000 .

[29]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[30]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[31]  Ami Marowka,et al.  The GRID: Blueprint for a New Computing Infrastructure , 2000, Parallel Distributed Comput. Pract..

[32]  Jennifer Widom,et al.  Behavior of database production rules: termination, confluence, and observable determinism , 1992, SIGMOD '92.

[33]  Satoshi Matsuoka,et al.  Performance Evaluation Model for Scheduling in Global Computing Systems , 2000, Int. J. High Perform. Comput. Appl..

[34]  Jennifer Widom,et al.  An algebraic approach to static analysis of active database rules , 2000, TODS.

[35]  Alexandra Poulovassilis,et al.  Analysis and optimisation of event-condition-action rules on XML , 2002, Comput. Networks.

[36]  Lingyun Yang,et al.  Conservative Scheduling: Using Predicted Variance to Improve Scheduling Decisions in Dynamic Environments , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[37]  Andrew A. Chien,et al.  The MicroGrid: a Scientific Tool for Modeling Computational Grids , 2006 .

[38]  Ian T. Foster,et al.  A problem-specific fault-tolerance mechanism for asynchronous, distributed systems , 2000, Proceedings 2000 International Conference on Parallel Processing.

[39]  Yuhui Qiu,et al.  A decentralized resource allocation policy in minigrid , 2007, Future Gener. Comput. Syst..

[40]  Andrew S. Grimshaw,et al.  Integrating fault-tolerance techniques in grid applications , 2000 .

[41]  Daniel A. Reed,et al.  Performance Contracts: Predicting and Monitoring Grid Application Behavior , 2001, GRID.

[42]  Jennifer Widom,et al.  Active Database Systems , 1995, Modern Database Systems.

[43]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[44]  Rajesh Raman,et al.  Resource management through multilateral matchmaking , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[45]  Dennis Shasha,et al.  Efficient Matching for Web-Based Publish/Subscribe Systems , 2000, CoopIS.

[46]  Sharma Chakravarthy,et al.  WebVigil: An approach to Just-In-Time Information Propagation In Large Network-Centric Environments , 2002, WebDyn@WWW.

[47]  Jack J. Dongarra,et al.  Scheduling workflow applications on processors with different capabilities , 2006, Future Gener. Comput. Syst..

[48]  Ibm Redbooks Enabling Applications for Grid Computing With Globus , 2003 .

[49]  Mark Levene,et al.  Web Dynamics , 2004, Springer Berlin Heidelberg.

[50]  Andrew A. Chien,et al.  Scheduling task parallel applications for rapid turnaround on desktop grids , 2005 .

[51]  Ana B. Alonso-Conde,et al.  Job Scheduling and Resource Management Techniques in Economic Grid Environments , 2003, European Across Grids Conference.

[52]  Alexandra Poulovassilis,et al.  Event-condition-action rules on RDF metadata in P2P environments , 2006, Comput. Networks.

[53]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[54]  Satoshi Matsuoka,et al.  Ninf-G: A Reference Implementation of RPC-based Programming Middleware for Grid Computing , 2003, Journal of Grid Computing.

[55]  Henri Casanova,et al.  Simgrid: a toolkit for the simulation of application scheduling , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[56]  Thomas R. Gross,et al.  Global address space, non-uniform bandwidth: a memory system performance characterization of parallel systems , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[57]  Richard Wolski,et al.  Representing Dynamic Performance Information in Grid Environments with the Network Weather Service , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).