论文信息 - A Data-Aware Scheduling Strategy for Executing Large-Scale Distributed Workflows

A Data-Aware Scheduling Strategy for Executing Large-Scale Distributed Workflows

Task scheduling is a crucial key component for the efficient execution of data-intensive applications on distributed environments, by which many machines must be coordinated to reduce execution times and bandwidth consumption. This paper presents ADAGE, a data-aware scheduler designed to efficiently execute data-intensive workflows in large-scale computers. The proposed scheduler is based on three key features: <inline-formula> <tex-math notation="LaTeX">$i$ </tex-math></inline-formula>) <italic>critical path analysis</italic>, for discovering the critical tasks of a workflow and reducing data transferring between nodes; <inline-formula> <tex-math notation="LaTeX">$ii$ </tex-math></inline-formula>) <italic>work giving</italic>, a new dynamic planning strategy for migrating tasks from overloaded to unloaded nodes; and <inline-formula> <tex-math notation="LaTeX">$iii$ </tex-math></inline-formula>) <italic>task replication</italic>, which executes task replicas on different nodes for improving both execution time and fault tolerance. Experiments performed on a distributed computing environment composed of up to 1,024 processing nodes show that ADAGE achieves better performances than existing scheduling systems, obtaining an average reduction of up to 66% in execution time.

Domenico Talia | Fabrizio Marozzo | Paolo Trunfio | Loris Belcastro | Salvatore Giampà

[1] James E. Kelley,et al. Critical-Path Planning and Scheduling: Mathematical Basis , 1961 .

[2] Rajesh Raman,et al. The classads language , 2004 .

[3] Domenico Talia,et al. JS4Cloud: script‐based workflow programming for scalable data analysis on cloud platforms , 2015, Concurr. Comput. Pract. Exp..

[4] C. L. Philip Chen,et al. Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[5] Satoshi Matsuoka,et al. Grid Datafarm Architecture for Petascale Data Intensive Computing , 2002, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID'02).

[6] Dana Petcu,et al. Exascale Machines Require New Programming Paradigms and Runtimes , 2015, Supercomput. Front. Innov..

[7] Mei-Hui Su,et al. Characterization of scientific workflows , 2008, 2008 Third Workshop on Workflows in Support of Large-Scale Science.

[8] Marta Mattoso,et al. A Survey of Data-Intensive Scientific Workflow Management , 2015, Journal of Grid Computing.

[9] Francisco Javier García Blas,et al. A Novel Data-Centric Programming Model for Large-Scale Parallel Systems , 2019, Euro-Par Workshops.

[10] Douglas Thain,et al. Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[11] MATRIX : MAny-Task computing execution fabRIc at eXascale , 2013 .

[12] Jesús Carretero,et al. A data‐aware scheduling strategy for workflow execution in clouds , 2017, Concurr. Comput. Pract. Exp..

[13] Fang Dong,et al. BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[14] Liang Hu,et al. Implementing Data Aware Scheduling In Gfarm(R) Using LSF(TM) Scheduler plugin Mechanism , 2005, GCA.

[15] Michael Lang,et al. Optimizing load balancing and data-locality with data-aware scheduling , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[16] Mehmet Balman,et al. Stork data scheduler: mitigating the data bottleneck in e-Science , 2011, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[17] Domenico Talia,et al. Programming models and systems for Big Data analysis , 2019, Int. J. Parallel Emergent Distributed Syst..

[18] Jesús Carretero,et al. A hierarchical parallel storage system based on distributed memory for large scale systems , 2013, EuroMPI.

[19] Jingwen Wang,et al. Utopia: A load sharing facility for large, heterogeneous distributed computer systems , 1993, Softw. Pract. Exp..

[20] Ke Wang,et al. Albatross: An efficient cloud-enabled task scheduling and execution framework using distributed message queues , 2016, 2016 IEEE 12th International Conference on e-Science (e-Science).

[21] Michael Lang,et al. Load‐balanced and locality‐aware scheduling for data‐intensive workloads at extreme scales , 2016, Concurr. Comput. Pract. Exp..

[22] Ke Wang,et al. ZHT: A Light-Weight Reliable Persistent Dynamic Scalable Zero-Hop Distributed Hash Table , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[23] Mehmet Balman,et al. A new paradigm: Data-aware scheduling in grid computing , 2009, Future Gener. Comput. Syst..

[24] Ion Stoica,et al. The Power of Choice in Data-Aware Cluster Scheduling , 2014, OSDI.

[25] Ke Wang,et al. FaBRiQ: Leveraging Distributed Hash Tables towards Distributed Publish-Subscribe Message Queues , 2015, 2015 IEEE/ACM 2nd International Symposium on Big Data Computing (BDC).

[26] James E. Kelley,et al. Critical-path planning and scheduling , 1899, IRE-AIEE-ACM '59 (Eastern).

[27] Víctor Méndez Muñoz,et al. A Critical Path File Location (CPFL) algorithm for data-aware multiworkflow scheduling on HPC clusters , 2017, Future Gener. Comput. Syst..

[28] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[29] Domenico Talia,et al. A Workflow Management System for Scalable Data Mining on Clouds , 2018, IEEE Transactions on Services Computing.

[30] Ewa Deelman,et al. WorkflowSim: A toolkit for simulating scientific workflows in distributed environments , 2012, 2012 IEEE 8th International Conference on E-Science.

[31] Daniel S. Katz,et al. Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..