Processing moldable tasks on the grid: Late job binding with lightweight user-level overlay

Independent observations and everyday user experience indicate that performance and reliability of large grid infrastructures may suffer from large and unpredictable variations. In this paper we study the impact of the job queuing time on processing of moldable tasks which are commonly found in large-scale production grids. We use the mean value and variance of makespan as the quality of service indicators. We develop a general task processing model to provide a quantitative comparison between two models: early and late job binding in a user-level overlay applied to the EGEE Grid infrastructure. We find that the late-binding model effectively defines a transformation of the distribution of makespan according to the Central Limit Theorem. As demonstrated by Monte Carlo simulations using real job traces, this transformation allows to substantially reduce the mean value and variance of makespan. For certain classes of applications task granularity may be adjusted such that a speedup of an order of magnitude or more may be achieved. We use this result to propose a general strategy for managing access to resources and optimization of workload based on Ganga and DIANE user-level overlay tools. Key features of this approach include: a late-binding scheduler, an ability to interface to a wide range of distributed systems, an ability to extend and customize the system to cover application-specific scheduling and processing patterns and finally, ease of use and lightweight deployment in the user space. We discuss the impact of this approach for some practical applications where efficient processing of many tasks is required to solve scientific problems.

[1]  Isabelle M. Demeure,et al.  Symmetric Mapping: An architectural pattern for resource supply in grids and clouds , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[2]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[3]  Ian Stokes-Rees,et al.  Developing LHCb Grid software: experiences and advances , 2007, Concurr. Comput. Pract. Exp..

[4]  Massimo Lamanna,et al.  Dependable Distributed Computing for the International Telecommunication Union Regional Radio Conference RRC06 , 2009, ArXiv.

[5]  Hugues Benoit-Cattin,et al.  Dynamic Partitioning of GATE Monte-Carlo Simulations on EGEE , 2010, Journal of Grid Computing.

[6]  Jennifer M. Schopf,et al.  Grids: The top ten questions , 2002, Sci. Program..

[7]  Alexandru Iosup,et al.  How are Real Grids Used? The Analysis of Four Grid Traces and Its Implications , 2006, 2006 7th IEEE/ACM International Conference on Grid Computing.

[8]  Francine Berman,et al.  A model for moldable supercomputer jobs , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[9]  Emmanuel Medernach,et al.  Workload Analysis of a Cluster in a Grid Environment , 2005, JSSPP.

[10]  T Maeno,et al.  PanDA: distributed production and distributed analysis system for ATLAS , 2008 .

[11]  Henri Casanova,et al.  Benefits and Drawbacks of Redundant Batch Requests , 2007, Journal of Grid Computing.

[12]  N. Jacq,et al.  Grid-Enabled High-Throughput In Silico Screening Against Influenza A Neuraminidase , 2006, IEEE Transactions on NanoBioscience.

[13]  Wen-Jun Tan,et al.  Two experiments with application-level quality of service on the EGEE grid , 2010, GMAC '10.

[14]  Mehmet Balman,et al.  A new paradigm: Data-aware scheduling in grid computing , 2009, Future Gener. Comput. Syst..

[15]  Anders Wäänänen,et al.  Advanced resource connector middleware for lightweight computational Grids , 2007 .

[16]  A. D. Meglio,et al.  Programming the Grid with gLite , 2006 .

[17]  Benoît Gotab,et al.  Distributed jobs on EGEE Grid infrastructure for an Earth science application: moment tensor computation at the centroid of an earthquake , 2009, Earth Sci. Informatics.

[18]  P. Buncic,et al.  AliEn—ALICE environment on the GRID , 2003 .

[19]  Marian Bubak,et al.  Perspectives on grid computing , 2010, Future Gener. Comput. Syst..

[20]  Johan Montagnat,et al.  Grid-enabled Virtual Screening Against Malaria , 2006, Journal of Grid Computing.

[21]  Zhao Zhang,et al.  Towards Loo on , 2008 .

[22]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[23]  Cécile Germain-Renaud Proceedings of the 6th international conference industry session on Grids meets autonomic computing , 2009, ICAC 2009.

[24]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[25]  Paul Lu,et al.  Practical Heterogeneous Placeholder Scheduling in Overlay Metacomputers: Early Experiences , 2002, JSSPP.

[26]  Maria Grazia Pia,et al.  Distributed geant4 simulation in medical and space science applications using DIANE framework and the GRID , 2003 .

[27]  Eduardo Huedo,et al.  A decentralized model for scheduling independent tasks in Federated Grids , 2009, Future Gener. Comput. Syst..

[28]  Igor Sfiligoi,et al.  glideinWMS - A generic pilot-based Workload Management System , 2008 .

[29]  Patricia J. Teller,et al.  Proceedings of the 2008 ACM/IEEE conference on Supercomputing , 2008, HiPC 2008.

[30]  Johannes Elmsheuser,et al.  Ganga: A tool for computational-task management and easy access to Grid resources , 2009, Comput. Phys. Commun..

[31]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[32]  Emmanouel A. Varvarigos,et al.  Statistical Analysis and Modeling of Jobs in a Grid Environment , 2007, Journal of Grid Computing.

[33]  Tristan Glatard,et al.  Modelling Pilot-Job Applications on Production Grids , 2009, Euro-Par Workshops.

[34]  Valeria V. Krzhizhanovskaya,et al.  Dynamic workload balancing of parallel applications with user-level scheduling on the Grid , 2009, Future Gener. Comput. Syst..

[35]  Yves Robert,et al.  Introduction to Scheduling , 2009, CRC computational science series.

[36]  C Germain-Renaud,et al.  Interactive volume reconstruction and measurement on the Grid. , 2005, Methods of information in medicine.

[37]  Eduardo Huedo,et al.  The GridWay Framework for Adaptive Scheduling and Execution on Grids , 2001, Scalable Comput. Pract. Exp..

[38]  Oscar Ardaiz,et al.  Grid-based dynamic service overlays , 2008 .

[39]  J. Moscicki Distributed analysis environment for HEP and interdisciplinary applications , 2003 .

[40]  John Shalf,et al.  SAGA: A Simple API for Grid Applications. High-level application programming on the Grid , 2006 .

[41]  Yonghong Yan,et al.  Comparative Study of Distributed Resource Management Systems – SGE, LSF, PBS Pro, and LoadLeveler , 2004 .

[42]  Radu Prodan,et al.  Towards a general model of the multi-criteria workflow scheduling on the grid , 2009, Future Gener. Comput. Syst..

[43]  Stephen Gilmore,et al.  Evaluating the performance of pipeline-structured parallel programs with skeletons and process algebra , 2005, Scalable Comput. Pract. Exp..

[44]  Francine Berman,et al.  Adaptive Computing on the Grid Using AppLeS , 2003, IEEE Trans. Parallel Distributed Syst..

[45]  Andrei Tsaregorodtsev,et al.  Developing LHCb Grid software: experiences and advances: Research Articles , 2007 .

[46]  Debasish Ghose,et al.  Scheduling Divisible Loads in Parallel and Distributed Systems , 1996 .

[47]  Warren Smith,et al.  A Resource Management Architecture for Metacomputing Systems , 1998, JSSPP.

[48]  Zhao Zhang,et al.  Toward loosely coupled programming on petascale systems , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[49]  Francine Berman,et al.  Master/slave computing on the Grid , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).

[50]  Charles Loomis,et al.  Scheduling for Responsive Grids , 2008, Journal of Grid Computing.

[51]  Joel Closier,et al.  DIRAC: a community grid solution , 2008 .

[52]  J. Lindemann,et al.  Advanced Resource Connector middleware for lightweight computational Grids , 2007, Future Gener. Comput. Syst..

[53]  M. Morris,et al.  The Design , 1998 .

[54]  Massimo Lamanna,et al.  Lattice QCD thermodynamics on the Grid , 2009, Comput. Phys. Commun..

[55]  Lorenza Saitta,et al.  Characterization of a computational grid as a complex system , 2009, GMAC '09.