Scheduling Data-Intensive Workloads in Large-Scale Distributed Systems: Trends and Challenges

With the explosive growth of big data, workloads tend to get more complex and computationally demanding. Such applications are processed on distributed interconnected resources that are becoming larger in scale and computational capacity. Data-intensive applications may have different degrees of parallelism and must effectively exploit data locality. Furthermore, they may impose several Quality of Service requirements, such as time constraints and resilience against failures, as well as other objectives, like energy efficiency. These features of the workloads, as well as the inherent characteristics of the computing resources required to process them, present major challenges that require the employment of effective scheduling techniques. In this chapter, a classification of data-intensive workloads is proposed and an overview of the most commonly used approaches for their scheduling in large-scale distributed systems is given. We present novel strategies that have been proposed in the literature and shed light on open challenges and future directions.

[1]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[2]  R. F. Freund,et al.  Dynamic Mapping of a Class of Independent Tasks onto Heterogeneous Computing Systems , 1999, J. Parallel Distributed Comput..

[3]  G.L. Stavrinides,et al.  Performance evaluation of gang scheduling in distributed real-time systems with possible software faults , 2008, 2008 International Symposium on Performance Evaluation of Computer and Telecommunication Systems.

[4]  Helen D. Karatza,et al.  Scheduling multiple task graphs with end-to-end deadlines in distributed real-time systems utilizing imprecise computations , 2010, J. Syst. Softw..

[5]  Chung Laung Liu,et al.  Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment , 1989, JACM.

[6]  Sanjoy K. Baruah,et al.  LSTF: A new scheduling policy for complex real-time tasks in multiple processor systems , 1997, Autom..

[7]  Joanna Koodziej,et al.  Evolutionary Hierarchical Multi-Criteria Metaheuristics for Scheduling in Large-Scale Grid Systems , 2012 .

[8]  Aloysius Ka-Lau Mok,et al.  Fundamental design problems of distributed systems for the hard-real-time environment , 1983 .

[9]  Helen D. Karatza,et al.  Multi-Criteria Job Scheduling in Grid Using an Accelerated Genetic Algorithm , 2012, Journal of Grid Computing.

[10]  Helen D. Karatza,et al.  Bag-of-Task Scheduling on Power-Aware Clusters Using a DVFS-Based Mechanism , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[11]  Jiang Zhu,et al.  Fog Computing: A Platform for Internet of Things and Analytics , 2014, Big Data and Internet of Things.

[12]  Geoffrey C. Fox,et al.  High Performance Parallel Computing with Clouds and Cloud Technologies , 2009, CloudComp.

[13]  Giorgio C. Buttazzo,et al.  Hard Real-Time Computing Systems: Predictable Scheduling Algorithms and Applications (Real-Time Systems Series) , 2010 .

[14]  Seetharami R. Seelam,et al.  Modeling the Impact of Checkpoints on Next-Generation Systems , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).

[15]  Helen D. Karatza,et al.  Performance evaluation of gang scheduling in a two-cluster system with migrations , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[16]  Helen D. Karatza,et al.  The impact of resource heterogeneity on the timeliness of hard real-time complex jobs , 2014, PETRA '14.

[17]  Kuo-Chan Huang,et al.  Scheduling Concurrent Workflows in HPC Cloud through Exploiting Schedule Gaps , 2011, ICA3PP.

[18]  Jane W.-S. Liu,et al.  Imprecise Results: Utilizing Partial Comptuations in Real-Time Systems , 1987, RTSS.

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  Helen D. Karatza Scheduling jobs with different characteristics in distributed systems , 2014, 2014 International Conference on Computer, Information and Telecommunication Systems (CITS).

[21]  Jinjun Chen,et al.  A security framework in G-Hadoop for big data computing across distributed Cloud data centres , 2014, J. Comput. Syst. Sci..

[22]  Helen D. Karatza,et al.  Scheduling real-time DAGs in heterogeneous clusters by combining imprecise computations and bin packing techniques for the exploitation of schedule holes , 2012, Future Gener. Comput. Syst..

[23]  Berkant Barla Cambazoglu,et al.  Improving the Performance of IndependentTask Assignment Heuristics MinMin,MaxMin and Sufferage , 2014, IEEE Transactions on Parallel and Distributed Systems.

[24]  Juan Li,et al.  An overview of energy efficiency techniques in cluster computing systems , 2013, Cluster Computing.

[25]  Helen D. Karatza,et al.  The Effect of Workload Computational Demand Variability on the Performance of a SaaS Cloud with a Multi-tier SLA , 2017, 2017 IEEE 5th International Conference on Future Internet of Things and Cloud (FiCloud).

[26]  Nobuyuki Yamasaki,et al.  An integration of imprecise computation model and real-time voltage and frequency scaling , 2015 .

[27]  Daniele Vigo,et al.  Bin packing approximation algorithms: Survey and classification , 2013 .

[28]  Helen D. Karatza,et al.  Periodic scheduling of mixed workload in distributed systems , 2017, 2017 International Conference on Engineering, Technology and Innovation (ICE/ITMC).

[29]  Georgios L. Stavrinides,et al.  Scheduling Different Types of Applications in a SaaS Cloud , 2016, BMSD 2016.

[30]  Sachchidanand Singh,et al.  Big Data analytics , 2012 .

[31]  Helen D. Karatza,et al.  Multi-criteria scheduling of Bag-of-Tasks applications on heterogeneous interlinked clouds with simulated annealing , 2015, J. Syst. Softw..

[32]  Jie Yang,et al.  A virtual machine based task scheduling approach to improving data locality for virtualized Hadoop , 2014, 2014 IEEE/ACIS 13th International Conference on Computer and Information Science (ICIS).

[33]  Tei-Wei Kuo,et al.  Slack reclamation for real-time task scheduling over dynamic voltage scaling multiprocessors , 2006, IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (SUTC'06).

[34]  K. Mani Chandy,et al.  A comparison of list schedules for parallel processing systems , 1974, Commun. ACM.

[35]  Marco Spuri,et al.  Deadline Scheduling for Real-Time Systems: Edf and Related Algorithms , 2013 .

[36]  Helen D. Karatza,et al.  Scheduling real-time parallel applications in SaaS clouds in the presence of transient software failures , 2016, 2016 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS).

[37]  Rajkumar Buyya,et al.  Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing , 2012, Future Gener. Comput. Syst..

[38]  Helen D. Karatza,et al.  Scheduling real‐time bag‐of‐tasks applications with approximate computations in SaaS clouds , 2020, Concurr. Comput. Pract. Exp..

[39]  Helen D. Karatza,et al.  Simulation-Based Performance Evaluation of an Energy-Aware Heuristic for the Scheduling of HPC Applications in Large-Scale Distributed Systems , 2017, ICPE Companion.

[40]  Rajkumar Buyya,et al.  Energy-Efficient Scheduling of Urgent Bag-of-Tasks Applications in Clouds through DVFS , 2014, 2014 IEEE 6th International Conference on Cloud Computing Technology and Science.

[41]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[42]  Domenico Talia,et al.  Clouds for Scalable Big Data Analytics , 2013, Computer.

[43]  Helen D. Karatza,et al.  Fault-tolerant Gang Scheduling in Distributed Real-time Systems Utilizing Imprecise Computations , 2009, Simul..

[44]  Chuliang Weng,et al.  Heuristic scheduling for bag-of-tasks applications in combination with QoS in the computational grid , 2005, Future Gener. Comput. Syst..

[45]  Giorgio C. Buttazzo,et al.  HARD REAL-TIME COMPUTING SYSTEMS Predictable Scheduling Algorithms and Applications , 2007 .

[46]  Rajiv Ranjan,et al.  G-Hadoop: MapReduce across distributed data centers for data-intensive computing , 2013, Future Gener. Comput. Syst..

[47]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[48]  Helen D. Karatza,et al.  A Cost-Effective and QoS-Aware Approach to Scheduling Real-Time Workflow Applications in PaaS and SaaS Clouds , 2015, 2015 3rd International Conference on Future Internet of Things and Cloud.

[49]  Helen D. Karatza,et al.  The Impact of Input Error on the Scheduling of Task Graphs with Imprecise Computations in Heterogeneous Distributed Real-Time Systems , 2011, ASMTA.

[50]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[51]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[52]  Helen D. Karatza,et al.  Scheduling multiple task graphs in heterogeneous distributed real-time systems by exploiting schedule holes with bin packing techniques , 2011, Simul. Model. Pract. Theory.

[53]  Jesús Carretero,et al.  Different aspects of workflow scheduling in large-scale distributed systems , 2017, Simul. Model. Pract. Theory.

[54]  Helen D. Karatza The Impact of Critical Sporadic Jobs on Gang Scheduling Performance in Distributed Systems , 2008, Simul..

[55]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[56]  Marco Spuri,et al.  Deadline Scheduling for Real-Time Systems , 2011 .

[57]  Viswanathan Manickam,et al.  A Fair and Efficient Gang Scheduling Algorithm for Multicore Processors , 2012 .