GWpilot: Enabling multi-level scheduling in distributed infrastructures with GridWay and pilot jobs

Abstract Current systems based on pilot jobs are not exploiting all the scheduling advantages that the technique offers, or they lack compatibility or adaptability. To overcome the limitations or drawbacks in existing approaches, this study presents a different general-purpose pilot system, GWpilot. This system provides individual users or institutions with a more easy-to-use, easy-to-install, scalable, extendable, flexible and adjustable framework to efficiently run legacy applications. The framework is based on the GridWay meta-scheduler and incorporates the powerful features of this system, such as standard interfaces, fair-share policies, ranking, migration, accounting and compatibility with diverse infrastructures. GWpilot goes beyond establishing simple network overlays to overcome the waiting times in remote queues or to improve the reliability in task production. It properly tackles the characterisation problem in current infrastructures, allowing users to arbitrarily incorporate customised monitoring of resources and their running applications into the system. This functionality allows the new framework to implement innovative scheduling algorithms that accomplish the computational needs of a wide range of calculations faster and more efficiently. The system can also be easily stacked under other software layers, such as self-schedulers. The advanced techniques included by default in the framework result in significant performance improvements even when very short tasks are scheduled.

[1]  Charles Loomis,et al.  Scheduling for Responsive Grids , 2008, Journal of Grid Computing.

[2]  Raj Jain,et al.  The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.

[3]  Thomas Hérault,et al.  Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid , 2005, Future Gener. Comput. Syst..

[4]  Rajkumar Buyya,et al.  A Taxonomy of Workflow Management Systems for Grid Computing , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[5]  Antonio Gómez-Iglesias,et al.  Grid Computing for Fusion Research , 2011 .

[6]  J. Moscicki Understanding and Mastering Dynamics in Computing Grids , 2013 .

[7]  Jeff T. Linderoth,et al.  Master–Worker: An Enabling Framework for Applications on the Computational Grid , 2001, Cluster Computing.

[8]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[9]  Eduardo Huedo,et al.  Federation of TeraGrid, EGEE and OSG infrastructures through a metascheduler , 2010, Future Gener. Comput. Syst..

[10]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[11]  Ian T. Foster,et al.  Grid information services for distributed resource sharing , 2001, Proceedings 10th IEEE International Symposium on High Performance Distributed Computing.

[12]  Tristan Glatard,et al.  A model of pilot-job resource provisioning on production grids , 2011, Parallel Comput..

[13]  Massimo Lamanna,et al.  Lattice QCD thermodynamics on the Grid , 2009, Comput. Phys. Commun..

[14]  J. Moscicki Distributed analysis environment for HEP and interdisciplinary applications , 2003 .

[15]  Marian Bubak,et al.  Processing moldable tasks on the grid: Late job binding with lightweight user-level overlay , 2011, Future Gener. Comput. Syst..

[16]  J. Lindemann,et al.  Advanced Resource Connector middleware for lightweight computational Grids , 2007, Future Gener. Comput. Syst..

[17]  Igor Sfiligoi,et al.  glideinWMS - A generic pilot-based Workload Management System , 2008 .

[18]  A. Rubio-Montero,et al.  Superconducting Vortex Lattice Configurations on Periodic Potentials: Simulation and Experiment , 2012 .

[19]  Zvisinei Sandi DEFINITION , 1961, A Philosopher Looks at Sport.

[20]  J. Moscicki,et al.  UvA-DARE ( Digital Academic Repository ) Understanding and mastering dynamics in computing grids : processing moldable tasks with user-level overlay , 2011 .

[21]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[22]  S. Ramakrishnan,et al.  e-Infrastructures in IT: A case study on Indian national grid computing initiative – GARUDA , 2009, Computer Science - Research and Development.

[23]  Daniele Spiga CMS workload management , 2007 .

[24]  Sebastián Reyes,et al.  Derivation of self-scheduling algorithms for heterogeneous distributed computer systems: Application to internet-based grids of computers , 2009, Future Gener. Comput. Syst..

[25]  Miron Livny,et al.  PanDA Pilot Submission using Condor-G: Experience and Improvements , 2011 .

[26]  Ewa Deelman,et al.  Experiences with resource provisioning for scientific workflows using Corral , 2010 .

[27]  Shantenu Jha,et al.  P∗: A model of pilot-abstractions , 2012, 2012 IEEE 8th International Conference on E-Science.

[28]  J. L. Vázquez-Poletti,et al.  A Comparison Between two Grid Scheduling Philosophies : EGEE WMS and Grid , 2007 .

[29]  Yang Gao,et al.  Adaptive grid job scheduling with genetic algorithms , 2005, Future Gener. Comput. Syst..

[30]  Henri Casanova,et al.  A GridRPC Model and API for End-User Applications , 2004 .

[31]  Michael Pinedo,et al.  Scheduling: Theory, Algorithms, and Systems , 1994 .

[32]  Eddy Caron,et al.  Definition, modelling and simulation of a grid computing scheduling system for high throughput computing , 2007, Future Gener. Comput. Syst..

[33]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[34]  Chris Brew,et al.  Distributed Analysis in CMS , 2010, Journal of Grid Computing.

[35]  Raj Jain,et al.  The Art of Computer Systems Performance Analysis : Tech-niques for Experimental Design , 1991 .

[36]  Eduardo Huedo,et al.  On the use of clouds for grid resource provisioning , 2011, Future Gener. Comput. Syst..

[37]  R. Santinelli,et al.  Job prioritization and fair share in the LHCb experiment , 2008 .

[38]  Miron Livny,et al.  Condor-a hunter of idle workstations , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[39]  Eduardo Huedo,et al.  Grid Architecture from a Metascheduling Perspective , 2010, Computer.

[40]  Igor Sfiligoi,et al.  Use of glide-ins in CMS for production and analysis , 2010 .

[41]  Assaf Schuster,et al.  GridBot: execution of bags of tasks in multiple grids , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[42]  Ignacio Martín Llorente,et al.  More Efficient Executions of Monte Carlo Fusion Codes by Means of Montera: The ISDEP Use Case , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[43]  Fatos Xhafa,et al.  Computational models and heuristic methods for Grid scheduling problems , 2010, Future Gener. Comput. Syst..

[44]  Johannes Elmsheuser,et al.  Ganga: A tool for computational-task management and easy access to Grid resources , 2009, Comput. Phys. Commun..

[45]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[46]  Weisong Shi,et al.  Toward practical multi-workflow scheduling in cluster and grid environments , 2009 .

[47]  Federico Carminati,et al.  The ALICE Workload Management System: Status before the real data taking , 2010 .

[48]  Igor Sfiligoi CDF computing , 2007, Comput. Phys. Commun..

[49]  Andrei Tsaregorodtsev,et al.  DIRAC Lightweight Information and Monitoring Services using XML-RPC and Instant Messaging , 2004 .

[50]  CaronEddy,et al.  Definition, modelling and simulation of a grid computing scheduling system for high throughput computing , 2007 .

[51]  Eduardo Huedo,et al.  A modular meta-scheduling architecture for interfacing with pre-WS and WS Grid resource management services , 2007, Future Gener. Comput. Syst..

[52]  Andrei Tsaregorodtsev,et al.  DIRAC optimized workload management , 2008 .

[53]  Tadashi Maeno,et al.  The PanDA system in the ATLAS experiment , 2008 .

[54]  Edward Walker,et al.  Personal adaptive clusters as containers for scientific jobs , 2007, Cluster Computing.

[55]  Jose M Hernandez,et al.  The CMS Monte Carlo Production System: Development and Design , 2008 .

[56]  V. Miccio,et al.  CRAB: A CMS Application for Distributed Analysis , 2009, IEEE Transactions on Nuclear Science.

[57]  P. Buncic,et al.  AliEn—ALICE environment on the GRID , 2003 .

[58]  Predrag Buncic,et al.  The architecture of the AliEn system , 2005 .

[59]  V. Krivenski,et al.  TJ-II Project: A Flexible Heliac Stellarator , 1990 .

[60]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[61]  M. Potekhin,et al.  Pilot factory – a Condor-based system for scalable Pilot Job generation in the Panda WMS framework , 2010 .

[62]  Paul Nilsson,et al.  Experience from a pilot based system for ATLAS , 2008 .

[63]  José Luis Vázquez-Poletti,et al.  A comparison between two grid scheduling philosophies: EGEE WMS and Grid Way , 2007, Multiagent Grid Syst..

[64]  Paul Avery,et al.  A Science Driven Production Cyberinfrastructure—the Open Science Grid , 2011, Journal of Grid Computing.

[65]  Valeria V. Krzhizhanovskaya,et al.  Dynamic workload balancing of parallel applications with user-level scheduling on the Grid , 2009, Future Gener. Comput. Syst..

[66]  A. J. Rubio-Montero,et al.  Drift Kinetic Equation Solver for Grid (DKEsG) , 2010, IEEE Transactions on Plasma Science.

[67]  Francine Berman,et al.  Adaptive Computing on the Grid Using AppLeS , 2003, IEEE Trans. Parallel Distributed Syst..

[68]  Peter M. A. Sloot,et al.  A Grid-based Virtual Reactor: Parallel performance and adaptive load balancing , 2008, J. Parallel Distributed Comput..

[69]  Hugues Benoit-Cattin,et al.  Monte Carlo simulation on heterogeneous distributed systems: A computing framework with parallel merging and checkpointing strategies , 2013, Future Gener. Comput. Syst..

[70]  Shantenu Jha,et al.  SAGA BigJob: An Extensible and Interoperable Pilot-Job Abstraction for Distributed Applications and Systems , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[71]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[72]  Rosario M. Piro,et al.  Using historical accounting information to predict the resource usage of grid jobs , 2009, Future Gener. Comput. Syst..

[73]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[74]  Eduardo Huedo,et al.  Experiences on Grid Resource Selection Considering Resource Proximity , 2003, European Across Grids Conference.

[75]  Eduardo Huedo,et al.  Interoperating grid infrastructures with the GridWay metascheduler , 2015, Concurr. Comput. Pract. Exp..

[76]  Ian Stokes-Rees,et al.  DIRAC: a scalable lightweight architecture for high throughput computing , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[77]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[78]  Anders Wäänänen,et al.  Advanced resource connector middleware for lightweight computational Grids , 2007 .

[79]  Cecchi Marco,et al.  The gLite workload management system , 2008 .

[80]  Moreno Marzolla,et al.  The gLite Workload Management System , 2008, GPC.

[81]  Joel Closier,et al.  DIRAC: a community grid solution , 2008 .

[82]  David S. Johnson,et al.  Computers and Inrracrobiliry: A Guide ro the Theory of NP-Completeness , 1979 .

[83]  David Groep,et al.  gLExec: gluing grid computing to the Unix world , 2008 .

[84]  Francine Berman,et al.  Adaptive scheduling of master/worker applications on distributed computational resources , 2001 .

[85]  Gilles Fedak,et al.  EDGeS: Bridging EGEE to BOINC and XtremWeb , 2009, Journal of Grid Computing.

[86]  A. Stephen McGough,et al.  A standards based approach to enabling legacy applications on the Grid , 2008, Future Gener. Comput. Syst..

[87]  Satoshi Matsuoka,et al.  Ninf-G: A Reference Implementation of RPC-based Programming Middleware for Grid Computing , 2003, Journal of Grid Computing.

[88]  John Shalf,et al.  SAGA: A Simple API for Grid Applications. High-level application programming on the Grid , 2006 .