Reliable and efficient hierarchical organization model for computational grid

Abstract Although the hierarchical model appears to be an effective solution to organize the resource managements in grid systems which have more stringent demand for both scalability and efficiency, it has some limitations which need to be addressed. For example, the master/manager resources in different levels represent single points of failure and they may be sources of bottleneck and communication overhead especially if they are not efficiently selected. Moreover, the dynamic and fault-prone nature of grids cannot be treated by static structures while the manual construction and repairing are also prohibitive due to the highly caused overhead which often represents a significant obstruction to an efficient resource utilization (especially for those with intermittent availabilities). The main objective of this paper is to first introduce a self-repairing n-try dynamic hierarchical grid model for scheduling and load balancing in which each master resource will be replicated on one of its children resources. Second, an efficient methodology to elect masters–replicas resources is proposed. In this methodology, the masters–replicas are selected based on both resource reliability (in terms of MTBF) and resource proximity from the other nodes in specified groups (in terms of communication latency). Validation of the proposed methodology based on the proposed model is done via simulation. Experimental results show that the proposed model has a great impact on the overall performance. Compared to other approaches, the simulations show that our approach decrements the average completion time ( A C T ) by 18.9%–25%, increases the tree stability ratio ( T S R ) up to 26.2%–27.1%, and minimizes the total communication overhead ( T C O ) by 4.4%–18.7% in the range of system parameter values examined.

[1]  S. F. El-Zoghdy A Hierarchical Load Balancing Policy for Grid Computing Environment , 2012 .

[2]  Malarvizhi Nandagopal,et al.  Hierarchical Status Information Exchange Scheduling and Load Balancing For Computational Grid Environments , 2010 .

[3]  Alexandru Iosup,et al.  On the dynamic resource availability in grids , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[4]  Cosimo Anglano,et al.  Fault-Tolerant Scheduling for Bag-of-Tasks Grid Applications , 2005, EGC.

[5]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[6]  Denis Trystram,et al.  On the Scheduling of Checkpoints in Desktop Grids , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[7]  Abiot Tarekegn Dagnew Optimization of periodic maintenance using condition monitoring techniques and operational data , 2012 .

[8]  Chita Ranjan Tripathy,et al.  An improved load-balancing mechanism based on deadline failure recovery on GridSim , 2015, Engineering with Computers.

[9]  Bianca Schroeder,et al.  A Large-Scale Study of Failures in High-Performance Computing Systems , 2010, IEEE Trans. Dependable Secur. Comput..

[10]  Bobby Bhattacharjee,et al.  Scalable peer finding on the Internet , 2002, Global Telecommunications Conference, 2002. GLOBECOM '02. IEEE.

[11]  P. Balasubramanie,et al.  User Demand Aware Grid Scheduling Model with Hierarchical Load Balancing , 2013 .

[12]  Jesús Montes Sánchez Global behavior modeling: a new approach to grid autonomic management , 2010 .

[13]  Koen Bertels,et al.  Self-Organizing Dynamic Ad Hoc Grids , 2008, 2008 Second IEEE International Conference on Self-Adaptive and Self-Organizing Systems Workshops.

[14]  Luis Rodero-Merino,et al.  A break in the clouds: towards a cloud definition , 2008, CCRV.

[15]  Waleed Meleis,et al.  Adaptive grid computing , 2010 .

[16]  Alexandru Iosup,et al.  The Failure Trace Archive: Enabling the comparison of failure measurements and models of distributed systems , 2013, J. Parallel Distributed Comput..

[17]  Raju Nedunchezhian,et al.  Performance-Driven Load Balancing with a Primary-Backup Approach for Computational Grids with Low Communication Cost and Replication Cost , 2013, IEEE Transactions on Computers.

[18]  S. Aljahdali,et al.  A two-level load balancing policy for grid computing , 2012, 2012 International Conference on Multimedia Computing and Systems.

[19]  Nael B. Abu-Ghazaleh,et al.  Automatic Clustering for Self-Organizing Grids , 2006, 2006 IEEE International Conference on Cluster Computing.

[20]  Mohammadi Ali Asghar,et al.  Cloud Computing Vs. Grid computing , 2013 .

[21]  Ivanoe De Falco,et al.  An adaptive multisite mapping for computationally intensive grid applications , 2010, Future Gener. Comput. Syst..

[22]  Qiang Xu,et al.  Automatic clustering of grid nodes , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[23]  Mukesh Yadav,et al.  On Fault Tolerance of Resources in Grid Environment , 2013 .

[24]  Unai Arronategui,et al.  A task routing approach to large-scale scheduling , 2013, Future Gener. Comput. Syst..

[25]  Chita Ranjan Tripathy,et al.  An improved approach for load balancing among heterogeneous resources in computational grids , 2014, Engineering with Computers.

[26]  Yahya Slimani,et al.  Dynamic Hierarchical Model for Fault Tolerant Grid Computing - TI Journals , 2012 .

[27]  Yang Zhang,et al.  Combined Fault Tolerance and Scheduling Techniques for Workflow Applications on Computational Grids , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[28]  Schahram Dustdar,et al.  Grid vs Cloud — A Technology Comparison , 2011, it Inf. Technol..

[29]  Jean-Marc Vincent,et al.  Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home , 2011, IEEE Transactions on Parallel and Distributed Systems.

[30]  Rodrigo da Rosa Righi,et al.  MigBSP : a new approach for processes rescheduling management on bulk synchronous parallel applications , 2009 .

[31]  Jon Stearley,et al.  Defining and Measuring Supercomputer Reliability, Availability, and Serviceability (RAS) , 2005 .

[32]  Nik Bessis,et al.  Nature Inspired Self Organization for Adhoc Grids , 2013, 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA).

[33]  Sanjeev K. Aggarwal,et al.  A Fault Tolerance Scheme for Hierarchical Dynamic Schedulers in Grids , 2008, 2008 International Conference on Parallel Processing - Workshops.

[34]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[35]  Yacine Challal,et al.  Performance Evaluation of Load Balancing in Hierarchical Architecture for Grid Computing Service Middleware , 2011 .

[36]  Soonwook Hwang,et al.  Grid workflow: a flexible failure handling framework for the grid , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[37]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.