Embarrassingly parallel jobs are not embarrassingly easy to schedule on the grid

Embarrassingly parallel applications represent an important workload in today's grid environments. Scheduling and execution of this class of applications is considered mostly a trivial and well-understood process on homogeneous clusters. However, while grid environments provide the necessary computational resources, associated resource heterogeneity represents a new challenge for efficient task execution for these types of applications across multiple resources. This paper presents a set of examples illustrating how execution characteristics of individual tasks, and consequently a job, are affected by the choice of task execution resources, task invocation parameters, and task input data attributes. It is the aim of this work to highlight this relationship between an application and an execution resource to promote development of better metascheduling techniques for the grid. By exploiting this relationship, application throughput can be maximized, also resulting in higher resource utilization. In order to achieve such benefits, a set of job scheduling and execution concerns is derived leading toward a computational pipeline for scheduling embarrassingly parallel applications in grid environments.

[1]  F. P. Huibers,et al.  Designing for water users organizations - examples from Africa. , 1994 .

[2]  David Abramson,et al.  High performance parametric modeling with Nimrod/G: killer application for the global grid? , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[3]  Subhash Saini,et al.  Agent-based grid load balancing using performance-driven task scheduling , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[4]  A. Ecer,et al.  DLB — A Dynamic Load Balancing Tool for Grid Computing , 1996 .

[5]  Rajkumar Buyya,et al.  Market-oriented Grids and Utility Computing: The State-of-the-art and Future Directions , 2008, Journal of Grid Computing.

[6]  Enis Afgan,et al.  Application Specification Language (ASL) - A Language for Describing Applications in Grid Computing , 2007, GSEM.

[7]  Alexandru Iosup,et al.  How are Real Grids Used? The Analysis of Four Grid Traces and Its Implications , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[8]  Heinz Stockinger,et al.  Grid Approach to Embarrassingly Parallel CPU-Intensive Bioinformatics Problems , 2006, 2006 Second IEEE International Conference on e-Science and Grid Computing (e-Science'06).

[9]  Enis Afgan,et al.  Performance Characterization of BLAST for the Grid , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[10]  Vipin Kumar,et al.  Introduction to Parallel Computing , 1994 .

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  L. Xu,et al.  Regular Paper: A Study of Architectural Optimization Methods in Bioinformatics Applications , 2007, Int. J. High Perform. Comput. Appl..

[13]  Marios D. Dikaiakos,et al.  Grid Resource Ranking Using Low-Level Performance Measurements , 2007, Euro-Par.

[14]  Johan Tordsson,et al.  THREE COMPLEMENTARY PERFORMANCE PREDICTION METHODS FOR GRID APPLICATIONS , 2007 .

[15]  Francine Berman,et al.  The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[16]  Marios D. Dikaiakos,et al.  GridBench: a tool for benchmarking grids , 2003, Proceedings. First Latin American Web Congress.

[17]  Peter M. A. Sloot,et al.  Grid Resource Selection by Application Benchmarking for Computational Haemodynamics Applications , 2005, International Conference on Computational Science.

[18]  David Abramson,et al.  Research from the Trenches: Nimrod-G Resource Broker for Service-Oriented Grid Computing , 2001, IEEE Distributed Syst. Online.