Stability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid

Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically saves the job’s state. But an inappropriate checkpointing interval prevails to delay in the job execution, and reduces the throughput. With that concern, this paper endeavors to ensure better performance on computational grid with more effective and reliable fault tolerant system using a novel Stability Assessment Metamorphic Approach (SAMA).  Here, the strategy used to attain fault tolerance is by adapting the checkpoints depending on the current status and past failure information of the resources dynamically, which is being maintained in the information server. Effective scheduling process can be achieved by fault tolerance based scheduling that involves in determination of deviation rate of all nodes using some high-stability assessment constraints. This evinces the job to be accomplished within the deadline with improved throughput and paves a way for making the grid environment trust worthy.

[1]  Rizos Sakellariou,et al.  Job Scheduling on the Grid: Towards SLA-Based Scheduling , 2006, High Performance Computing Workshop.

[2]  Amir Roth,et al.  CPROB: Checkpoint Processing with Opportunistic Minimal Recovery , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[3]  Ajanta De Sarkar,et al.  ON FAULT TOLERANCE OF RESOURCES IN COMPUTATIONAL GRIDS , 2012, Grid 2012.

[4]  Christopher E. Dabrowski,et al.  Reliability in grid computing systems , 2009, Concurr. Comput. Pract. Exp..

[6]  Jason Maassen,et al.  Fault-Tolerant Scheduling of Fine-Grained Tasks in Grid Environments , 2006, Int. J. High Perform. Comput. Appl..

[7]  Jong Kim,et al.  Secure checkpointing , 2003, J. Syst. Archit..

[8]  G. Sumathi,et al.  Dynamic Adaptation of Checkpoints and Rescheduling in Grid Computing , 2010 .

[9]  V. R. Uthariaraj,et al.  FAULT TOLERANT SCHEDULING STRATEGY FOR COMPUTATIONAL GRID ENVIRONMENT , 2010 .

[10]  Jason Nieh,et al.  Transparent Checkpoint-Restart of Distributed Applications on Commodity Clusters , 2005, 2005 IEEE International Conference on Cluster Computing.

[11]  James S. Plank,et al.  Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems , 2001, J. Parallel Distributed Comput..

[12]  Youcef Derbal A new fault-tolerance framework for grid computing , 2006, Multiagent Grid Syst..

[13]  Kishor S. Trivedi,et al.  Performance and Reliability of Tree-Structured Grid Services Considering Data Dependence and Failure Correlation , 2007, IEEE Transactions on Computers.

[14]  Javad Bayrampoor,et al.  A Balanced Scheduling Algorithm with Fault Tolerant and Task Migration based on Primary Static Mapping (PSM) in Grid , 2012 .

[15]  Geoffrey C. Fox,et al.  Advanced Scheduling Strategies and Grid Programming Environments , 2008 .

[16]  Ciprian Dobre,et al.  RE-SCHEDULING AND ERROR RECOVERING ALGORITHM FOR DISTRIBUTED ENVIRONMENTS , 2011 .

[17]  Ritu Garg,et al.  Fault TOLERANCE IN GRID COMPUTING : STATE OF THE ART AND OPEN ISSUES , 2011 .

[18]  A. Govardhan,et al.  Reputation Aware Reliable Distributed Grid Scheduler for Mixed Tasks , 2012 .

[19]  S. Gokul Dev,et al.  An Adaptive Job Scheduling Methodologies with Fault Tolerance Strategy for Computational Grid Environment , 2012 .

[20]  D. Manivannan,et al.  An optimistic checkpointing and selective message logging approach for consistent global checkpoint collection in distributed systems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[21]  Miguel Correia,et al.  Exploiting Tuple Spaces to Provide Fault-Tolerant Scheduling on Computational Grids , 2007, 10th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'07).

[22]  P. Latchoumy,et al.  SURVEY ON FAULT TOLERANCE IN GRID COMPUTING , 2011 .

[23]  N. Satyanarayana,et al.  Scheduling Tasks on Most Suitable Fault tolerant Resource for Execution in Computational Grid , 2012 .