Different aspects of workflow scheduling in large-scale distributed systems

Abstract As large-scale distributed systems gain momentum, the scheduling of workflow applications with multiple requirements in such computing platforms has become a crucial area of research. In this paper, we investigate the workflow scheduling problem in large-scale distributed systems, from the Quality of Service (QoS) and data locality perspectives. We present a scheduling approach, considering two models of synchronization for the tasks in a workflow application: (a) communication through the network and (b) communication through temporary files. Specifically, we investigate via simulation the performance of a heterogeneous distributed system, where multiple soft real-time workflow applications arrive dynamically. The applications are scheduled under various tardiness bounds, taking into account the communication cost in the first case study and the I/O cost and data locality in the second. The simulation results provide useful insights into the impact of tardiness bound and data locality on the system performance.

[1]  Byung-Gon Chun,et al.  CloneCloud: elastic execution between mobile device and cloud , 2011, EuroSys '11.

[2]  Daniele Vigo,et al.  Bin packing approximation algorithms: Survey and classification , 2013 .

[3]  Luiz Fernando Bittencourt,et al.  Workflow scheduling for SaaS / PaaS cloud providers considering two SLA levels , 2012, 2012 IEEE Network Operations and Management Symposium.

[4]  Helen D. Karatza,et al.  The Impact of Input Error on the Scheduling of Task Graphs with Imprecise Computations in Heterogeneous Distributed Real-Time Systems , 2011, ASMTA.

[5]  Scott Shenker,et al.  Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks , 2014, SoCC.

[6]  Domenico Talia,et al.  JS4Cloud: script‐based workflow programming for scalable data analysis on cloud platforms , 2015, Concurr. Comput. Pract. Exp..

[7]  G.L. Stavrinides,et al.  Performance evaluation of gang scheduling in distributed real-time systems with possible software faults , 2008, 2008 International Symposium on Performance Evaluation of Computer and Telecommunication Systems.

[8]  Kuo-Chan Huang,et al.  Scheduling Concurrent Workflows in HPC Cloud through Exploiting Schedule Gaps , 2011, ICA3PP.

[9]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[10]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[11]  Helen D. Karatza,et al.  Scheduling multiple task graphs in heterogeneous distributed real-time systems by exploiting schedule holes with bin packing techniques , 2011, Simul. Model. Pract. Theory.

[12]  Helen D. Karatza,et al.  Scheduling real-time DAGs in heterogeneous clusters by combining imprecise computations and bin packing techniques for the exploitation of schedule holes , 2012, Future Gener. Comput. Syst..

[13]  Helen D. Karatza,et al.  The impact of resource heterogeneity on the timeliness of hard real-time complex jobs , 2014, PETRA '14.

[14]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15]  Giorgio C. Buttazzo,et al.  HARD REAL-TIME COMPUTING SYSTEMS Predictable Scheduling Algorithms and Applications , 2007 .

[16]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[17]  Helen D. Karatza,et al.  Scheduling multiple task graphs with end-to-end deadlines in distributed real-time systems utilizing imprecise computations , 2010, J. Syst. Softw..

[18]  Helen D. Karatza,et al.  A Cost-Effective and QoS-Aware Approach to Scheduling Real-Time Workflow Applications in PaaS and SaaS Clouds , 2015, 2015 3rd International Conference on Future Internet of Things and Cloud.

[19]  Rajkumar Buyya,et al.  Deadline‐constrained coevolutionary genetic algorithm for scientific workflow scheduling in cloud computing , 2017, Concurr. Comput. Pract. Exp..

[20]  Jarek Nabrzyski,et al.  Algorithms for cost- and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds , 2015 .

[21]  Shiwen Mao,et al.  A survey of mobile cloud computing for rich media applications , 2013, IEEE Wireless Communications.

[22]  Helen D. Karatza,et al.  Fault-tolerant Gang Scheduling in Distributed Real-time Systems Utilizing Imprecise Computations , 2009, Simul..

[23]  Giorgio C. Buttazzo,et al.  Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications (Real-Time Systems Series) , 2010 .

[24]  Jesús Carretero,et al.  Flexible Data-Aware Scheduling for Workflows over an In-memory Object Store , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[25]  Kuo-Chan Huang,et al.  Adaptive dual-criteria task group allocation for clustering-based multi-workflow scheduling on parallel computing platform , 2015, The Journal of Supercomputing.

[26]  Alec Wolman,et al.  MAUI: making smartphones last longer with code offload , 2010, MobiSys '10.

[27]  Giorgio Buttazzo,et al.  Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications , 1997 .

[28]  Marco Spuri,et al.  Deadline Scheduling for Real-Time Systems , 2011 .

[29]  Josiah L. Carlson,et al.  Redis in Action , 2013 .

[30]  Miron Livny,et al.  Pegasus, a workflow management system for science automation , 2015, Future Gener. Comput. Syst..

[31]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[32]  Marco Spuri,et al.  Deadline Scheduling for Real-Time Systems: Edf and Related Algorithms , 2013 .

[33]  Helen D. Karatza,et al.  Scheduling real-time parallel applications in SaaS clouds in the presence of transient software failures , 2016, 2016 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS).

[34]  Daniel S. Katz,et al.  Parallelizing the execution of sequential scripts , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[35]  Yue Wu,et al.  QoS‐aware indiscriminate volume storage cloud , 2017, Concurr. Comput. Pract. Exp..

[36]  Alejandro Duran,et al.  Productive Cluster Programming with OmpSs , 2011, Euro-Par.

[37]  Daniel S. Katz,et al.  Swift/T: Large-Scale Application Composition via Distributed-Memory Dataflow Processing , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.

[38]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[39]  Yinong Chen,et al.  Service-Oriented Computing and Web Software Integration: From Principles to Development , 2011 .

[40]  Parameswaran Ramanathan,et al.  Inserting Placeholder Slack to Improve Run-Time Scheduling of Non-preemptible Real-Time Tasks in Heterogeneous Systems , 2014, 2014 27th International Conference on VLSI Design and 2014 13th International Conference on Embedded Systems.

[41]  Yajun Ha,et al.  Quality-Driven Dynamic Scheduling for Real-Time Adaptive Applications on Multiprocessor Systems , 2013, IEEE Transactions on Computers.

[42]  Hamid Arabnejad,et al.  List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table , 2014, IEEE Transactions on Parallel and Distributed Systems.

[43]  Daniel S. Katz,et al.  Swift: A language for distributed parallel scripting , 2011, Parallel Comput..

[44]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[45]  Georgios L. Stavrinides,et al.  Scheduling Different Types of Applications in a SaaS Cloud , 2016, BMSD 2016.

[46]  Helen D. Karatza,et al.  Performance of gang scheduling strategies in a parallel system , 2009, Simul. Model. Pract. Theory.

[47]  Gilles Fedak,et al.  The Case for Workflow-Aware Storage:An Opportunity Study , 2015, Journal of Grid Computing.

[48]  Kathryn A. Dowsland,et al.  An investigation into two bin packing problems with ordering and orientation implications , 2011, Eur. J. Oper. Res..