GA Based Scheduling Model for Computational Grid to Minimize Turnaround Time

Scheduling on distributed systems is an NP hard problem and grid being a wide heterogeneous expandable system makes scheduling even a tougher job. Genetic algorithm, based on natural selection and evolution has gained popularity in recent times because of its effectiveness in handling optimization problems. In this article, a job-scheduling model for a computational grid with the objective of minimizing the turnaround time using genetic algorithm is proposed. The model evaluates various clusters in the grid to find the most suitable one with minimum turnaround time for the job-scheduling. Simulation studies compare the performance of this model with other similar models. DOI: 10.4018/jghpc.2009070806 IGI PUBLISHING This paper appears in the publication, International Journal of Grid and High Performance Computing,Volume 1, Issue 4 edited by Emmanuel Udoh and Frank Zhigang Wang © 2009, IGI Global 701 E. Chocolate Avenue, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.igi-global.com ITJ 5401 International Journal of Grid and High Performance Computing, 1(4), 70-90, October-December 2009 71 Copyright © 2009, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. aspects of the job, demanding high computational power, in order to maximize the system throughput while minimizing the turnaround time of the job submitted as well as balancing load amongst the system resources. A computational grid, therefore, enables its constituents with the computational power irrespective of its own computational capabilities (Tarricone and Esposito, 2005; Ernemann, Hamscher and Yahyapour, 2002; Hamscher, Schwiegelshohn, Streit and Yahyapour, 2005; Grid Computing Info centre, 2008). A job is considered to be a collection of various modules submitted along with its Job Precedence Graph (JPG) in which the position of a module indicates its precedence level of execution. JPG further defines the degree of concurrency for the job execution. Thus, a job is a single set of concurrent modules to be executed on a set of resources. The resources could be computers, workstations, PDA’s, storage devices and network links, constituting the grid. Scheduling is the problem of mapping of these job modules on the grid resources keeping in mind their precedence as defined in the JPG (Vidyarthi, Tripathi and Sarkar, 2001; Vidyarthi, Tripathi and Sarkar, 2001). Genetic algorithm is an effective tool often used to solve NP class of problems. GA is better than other conventional optimization techniques, as it works on a set of points rather than on a single point. Further, due to working over a set of data simultaneously the probability of finding a false peak is reduced. Genetic Algorithm is a random search method but different from the other similar methods in the sense that it uses the information generated in the past and as well as for the present estimates (Goldberg, 2007; Mitchell, 1999). The proposed model uses genetic algorithm to schedule the jobs on the grid with the objective of minimizing the turnaround time of the given jobs. A grid can be considered to consist of many specialized clusters. If the specialization of the cluster matches with the nature of the job, the cluster is evaluated for the execution cost that it can offer to the job. Here, nature of the job refers to the specifications and requirements of the job in terms of resource characteristics. For example, a graphics application may need computational resources specialized for graphic specific jobs or a database application may demand resources which handles faster storage, data movement and retrieval for efficient execution of the job. The drawback with most of the task schedulers is that they have considered allocation of only one task over the nodes of the grid. This does not account for the previous workload on the nodes and therefore makes an unrealistic schedule. Moreover, the single entry point for the job becomes a bottleneck in the system’s performance, which is the case with many grid schedulers. Further, they do not consider the job’s requirements in terms of resources and assume to have control over the scheduling policy of the independent nodes, which is also unrealistic (Afzal, Darlington and McGough, 2006; Aggarwal and Aggarwal, 2006; Dalheimer, Pfreundt and Merz, 2005; Ranjan, Harwood and Buyya, 2006; Shan, Hongzhang, Oliker and Biswas, 2003; Fidanova and Durchova, 2006; Spooner, Jarvis, Cao, Saini and Nudd, 2003; Wiriyaprasit, Muangsin and Veera, 2004). The use of genetic algorithm for dynamic load balancing has been observed and reported in (Zomaya, Yee-Hwei, 2001), where genetic algorithm has been used to design and implement a scheduler to minimize the turnaround time, make-span and the idle time of the resources of the job following the real time constraints (Aggarwal, Kent and Ngom, 2005). The Pareto algorithm using GA has been used for the wireless grids (Benedict and Vasudevan, 2008). Since GA itself leaves room for improvements, an improved GA scheme for faster optimization has been proposed in (Yin, Wu and Zhou, 2007). A threshold selection operator has been proposed in (Khanbary, Mohammed and Vidyarthi, 2009) for improved results. The rest of the article is organized as follows. GA is briefly discussed in the next section followed by the proposed GA based scheduler along with the assumptions, data structures and the notation used. The fitness function (cost estimation) of the job allocation is derived 19 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the publisher's webpage: www.igi-global.com/article/based-scheduling-modelcomputational-grid/37514

[1]  Melanie Mitchell,et al.  An introduction to genetic algorithms , 1996 .

[2]  Ramin Yahyapour,et al.  Design and evaluation of job scheduling strategies for grid computing , 2000, GRID.

[3]  Ali Afzal,et al.  Capacity Planning and Stochastic Scheduling in Large-Scale Grids , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[4]  Huizhong Wu,et al.  A Task Duplication Based Scheduling Algorithm on GA in Grid Computing Systems , 2005, ICNC.

[5]  Alioune Ngom,et al.  Genetic algorithm based scheduler for computational grids , 2005, 19th International Symposium on High Performance Computing Systems and Applications (HPCS'05).

[6]  Stefka Fidanova,et al.  Ant Algorithm for Grid Scheduling Problem , 2005, LSSC.

[7]  Nik Bessis,et al.  Managing Inconsistencies in Data Grid Environments: A Practical Approach , 2010, Int. J. Grid High Perform. Comput..

[8]  Shajulin Benedict,et al.  A Niched Pareto GA Approach for Scheduling Scientific Workflows in Wireless Grids , 2008, J. Comput. Inf. Technol..

[9]  Deo Prakash Vidyarthi,et al.  Modified Genetic Algorithm with Threshold Selection , 2009 .

[10]  Ekaterina Kldiashvili Grid Technologies for E-Health: Applications for Telemedicine Services and Delivery , 2010 .

[11]  Albert Y. Zomaya,et al.  Observations on Using Genetic Algorithms for Dynamic Load-Balancing , 2001, IEEE Trans. Parallel Distributed Syst..

[12]  Hongzhang Shan,et al.  Job Superscheduler Architecture and Performance in Computational Grid Environments , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[13]  Luciano Tarricone,et al.  Grid computing for electromagnetics , 2004 .

[14]  Peter Merz,et al.  Calana: a general-purpose agent-based grid scheduler , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[15]  S. Wiriyaprasit,et al.  The impact of local priority policies on grid scheduling performance and an adaptive policy-based grid scheduling algorithm , 2004, Proceedings. Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region, 2004..

[16]  Xiaolin Li,et al.  Scheduling Large-Scale DNA Sequencing Applications , 2010 .

[17]  Zahid Raza,et al.  A Fault Tolerant Grid Scheduling Model to Minimize Turnaround Time , 2008, HPCNCS.

[18]  Jack Dongarra,et al.  Handbook of Research on Scalable Computing Technologies , 2009 .

[19]  Anil Kumar Tripathi,et al.  Allocation Aspects in Distributed Computing Systems , 2001 .

[20]  Akshai K. Aggarwal,et al.  A Unified Scheduling Algorithm for Grid Applications , 2006, 20th International Symposium on High-Performance Computing in an Advanced Collaborative Environment (HPCS'06).

[21]  Hao Yin,et al.  An Improved Genetic Algorithm with Limited Iteration for Grid Scheduling , 2007, Sixth International Conference on Grid and Cooperative Computing (GCC 2007).

[22]  Subhash Saini,et al.  Local grid scheduling techniques using performance prediction , 2003 .

[23]  Rajkumar Buyya,et al.  SLA-Based Coordinated Superscheduling Scheme for Computational Grids , 2006, 2006 IEEE International Conference on Cluster Computing.

[24]  Ramin Yahyapour,et al.  Benefits of global grid computing for job scheduling , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.