Workflow Fairness Control on Online and Non-clairvoyant Distributed Computing Platforms

Fairly allocating distributed computing resources among workflow executions is critical to multi-user platforms. However, this problem remains mostly studied in clairvoyant and offline conditions, where task durations on resources are known, or the workload and available resources do not vary along time. We consider a non-clairvoyant, online fairness problem where the platform workload, task costs and resource characteristics are unknown and not stationary. We propose a fairness control loop which assigns task priorities based on the fraction of pending work in the workflows. Workflow characteristics and performance on the target resources are estimated progressively, as information becomes available during the execution. Our method is implemented and evaluated on 4 different applications executed in production conditions on the European Grid Infrastructure. Results show that our technique reduces slowdown variability by 3 to 7 compared to first-come-first-served.

[1]  Tchimou N'Takpé,et al.  Concurrent scheduling of parallel task graphs on multi-clusters using constrained resource allocations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[2]  Andrei Tchernykh,et al.  Multiple Workflow Scheduling Strategies with User Run Time Estimates on a Grid , 2012, Journal of Grid Computing.

[3]  Kuo-Chan Huang,et al.  Online scheduling of workflow applications in grid environments , 2011, Future Gener. Comput. Syst..

[4]  Tristan Glatard,et al.  On-Line, Non-clairvoyant Optimization of Workflow Activity Granularity on Grids , 2013, Euro-Par.

[5]  Rizos Sakellariou,et al.  Scheduling multiple DAGs onto heterogeneous systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[6]  S Stute,et al.  GATE V6: a major enhancement of the GATE simulation platform enabling modelling of CT and radiotherapy , 2011, Physics in medicine and biology.

[7]  A. Reilhac,et al.  PET-SORTEO: validation and development of database of Simulated PET volumes , 2005, IEEE Transactions on Nuclear Science.

[8]  Hamid Arabnejad,et al.  Fairness Resource Sharing for Dynamic Workflow Scheduling on Heterogeneous Systems , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.

[9]  Tristan Glatard,et al.  A Science-Gateway Workload Archive to Study Pilot Jobs, User Activity, Bag of Tasks, Task Sub-steps, and Workflow Executions , 2012, Euro-Par Workshops.

[10]  Johan Montagnat,et al.  Flexible and Efficient Workflow Deployment of Data-Intensive Applications On Grids With MOTEUR , 2008, Int. J. High Perform. Comput. Appl..

[11]  E. Lanciotti,et al.  DIRAC3 – the new generation of the LHCb grid software , 2009 .

[12]  Henri Casanova,et al.  On cluster resource allocation for multiple parallel task graphs , 2010, J. Parallel Distributed Comput..

[13]  Péter Kacsuk,et al.  P‐GRADE portal family for grid infrastructures , 2011, Concurr. Comput. Pract. Exp..

[14]  P. Sadayappan,et al.  Job fairness in non-preemptive job scheduling , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..

[15]  J. Jensen,et al.  Calculation of pressure fields from arbitrarily shaped, apodized, and excited ultrasound transducers , 1992, IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control.

[16]  Johan Montagnat,et al.  A Virtual Imaging Platform for Multi-Modality Medical Image Simulation , 2013, IEEE Transactions on Medical Imaging.

[17]  Johan Montagnat,et al.  A data-driven workflow language for grids based on array programming principles , 2009, WORKS '09.

[18]  Francisco Brasileiro,et al.  Supporting e-Science Applications on e-Infrastructures: Some Use Cases from Latin America , 2011 .

[19]  Tristan Glatard,et al.  Self-Healing of Operational Workflow Incidents on Distributed Computing Infrastructures , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).