Optimal Task Scheduling for Distributed Cluster With Active Storage Devices and Accelerated Nodes

With advancements in compute-intensive and memory-bound applications, the need for faster and more energy-efficient processing platforms continues. In support of these advancements, heterogeneous platforms have been proposed to enhance the performance and efficiency in the cloud. These platforms include field programmable gate arrays and graphical processing units in addition to general-purpose processors. Furthermore, there is a strong interest in advancing active solid-state drives to support both storage and computation. In this paper, we present a generic formulation to support the modeling of such a heterogeneous cloud environment, without being specific to a particular cloud platform such as Spark or Hadoop. We represent the cloud as a collection of clients, middleware control nodes, and high performance compute nodes (HPN), where the HPNs represent the options of advanced compute technologies in a heterogeneous cloud. The objective of the paper is to present a simple and efficient formulation for scheduling applications in such a heterogeneous cloud. Consistent with recent software modeling of artificial intelligence applications, we propose to map applications into directed acyclic graph representations of tasks. The optimization problem is then formulated to infer the best scheduling of tasks on the HPNs, while minimizing the overall execution time and data communication delays between nodes. Unlike existing scheduling algorithms that assume equal performance across nodes, our formulation explicitly takes into account the different compute capabilities of the heterogeneous nodes. The resulting task scheduling is then evaluated to provide insights into the performance gains with the proposed advanced heterogeneous cloud computing environment. The results show improved performance when comparing the proposed task-scheduling algorithm with the genetic algorithm and heterogeneous earliest finish time algorithms. We also show the performance gains achieved with the optimal task scheduling on a heterogeneous cloud system as compared with a conventional CPU-only cloud system.

[1]  Andrew Y. C. Nee,et al.  A modified genetic algorithm for distributed scheduling problems , 2003, J. Intell. Manuf..

[2]  Sharma Kuldeep An Optimal Task Allocation Model for System Cost Analysis in Heterogeneous Distributed Computing Systems: A Heuristic Approach , 2011 .

[3]  Martin Margala,et al.  Evaluating FPGA-acceleration for real-time unstructured search , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[4]  Emmanuel Jeannot,et al.  Comparative Evaluation Of The Robustness Of DAG Scheduling Heuristics , 2008, CoreGRID Integration Workshop.

[5]  Hong-Zhong Huang,et al.  Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery , 2011, IEEE Transactions on Reliability.

[6]  Joshua Zhexue Huang,et al.  Big data analytics on Apache Spark , 2016, International Journal of Data Science and Analytics.

[7]  Alberto L. Sangiovanni-Vincentelli,et al.  Optimization of task allocation and priority assignment in hard real-time distributed systems , 2012, TECS.

[8]  Fatma A. Omara,et al.  Genetic algorithms for task scheduling problem , 2010, J. Parallel Distributed Comput..

[9]  Vivek Sarkar,et al.  SWAT: A Programmable, In-Memory, Distributed, High-Performance Computing Platform , 2016, HPDC.

[10]  Rizos Sakellariou,et al.  DAG Scheduling Using a Lookahead Variant of the Heterogeneous Earliest Finish Time Algorithm , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[11]  Hironori Kasahara,et al.  A standard task graph set for fair evaluation of multiprocessor scheduling algorithms , 2002 .

[12]  Boon Thau Loo,et al.  Benchmarking approach for designing a mapreduce performance model , 2013, ICPE '13.

[13]  Selmin Nurcan,et al.  Multi-objective Resources Allocation Approaches for Workflow Applications in Cloud Environments , 2012, OTM Workshops.

[14]  Rubao Lee,et al.  Spark-GPU: An accelerated in-memory data processing engine on clusters , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[15]  Y.-K. Kwok,et al.  Static scheduling algorithms for allocating directed task graphs to multiprocessors , 1999, CSUR.

[16]  S. N. Sivanandam,et al.  Dynamic Task Scheduling with Load Balancing using Hybrid Particle Swarm Optimization , 2009 .

[17]  Yu Cao,et al.  HeteroSpark: A heterogeneous CPU/GPU Spark platform for machine learning algorithms , 2015, 2015 IEEE International Conference on Networking, Architecture and Storage (NAS).

[18]  Hiroki Matsutani,et al.  Accelerating Spark RDD Operations with Local and Remote GPU Devices , 2016, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS).

[19]  Kuo-Chi Lin,et al.  An incremental genetic algorithm approach to multiprocessor scheduling , 2004, IEEE Transactions on Parallel and Distributed Systems.

[20]  Siddharth Singh,et al.  Optimized Task Scheduling Using Differential Evolutionary Algorithm , 2017 .

[21]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[22]  Jin Xu,et al.  Chemical Reaction Optimization for Task Scheduling in Grid Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[23]  Bora Uçar,et al.  Task assignment in heterogeneous computing systems , 2006, J. Parallel Distributed Comput..

[24]  Mariette Awad,et al.  Hadoop Extensions for Distributed Computing on Reconfigurable Active SSD Clusters , 2014, TACO.

[25]  Jason Cong,et al.  Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale , 2016, SoCC.

[26]  Yves Robert,et al.  Scheduling Concurrent Bag-of-Tasks Applications on Heterogeneous Platforms , 2010, IEEE Transactions on Computers.

[27]  Albert Y. Zomaya,et al.  Evolutionary Scheduling of Dynamic Multitasking Workloads for Big-Data Analytics in Elastic Cloud , 2014, IEEE Transactions on Emerging Topics in Computing.

[28]  Steven D. Eppinger,et al.  Organizing the Tasks in Complex Design Projects , 1991, MIT-JSME Workshop.

[29]  Raghid Morcel,et al.  Speedy Cloud: Cloud Computing with Support for Hardware Acceleration Services , 2019, IEEE Transactions on Cloud Computing.

[30]  Teodor Gabriel Crainic,et al.  Benchmark-problem instances for static scheduling of task graphs with communication delays on homogeneous multiprocessor systems , 2006, Comput. Oper. Res..

[31]  Richard Bellman,et al.  ON A ROUTING PROBLEM , 1958 .

[32]  Yskandar Hamam,et al.  Task Allocation for Minimizing Programs Completion Time in Multicomputer Systems , 2004, ICCSA.

[33]  Miron Livny,et al.  Online Task Resource Consumption Prediction for Scientific Workflows , 2015, Parallel Process. Lett..

[34]  Thomas Ertl,et al.  PaTraCo: A Framework Enabling the Transparent and Efficient Programming of Heterogeneous Compute Networks , 2010, EGPGV@Eurographics.

[35]  Kenli Li,et al.  Energy-Aware Data Allocation and Task Scheduling on Heterogeneous Multiprocessor Systems With Time Constraints , 2014, IEEE Transactions on Emerging Topics in Computing.

[36]  Francis C. M. Lau,et al.  A new method for independent task scheduling in nonlinearly DAG clustering , 2004, 7th International Symposium on Parallel Architectures, Algorithms and Networks, 2004. Proceedings..

[37]  Chao Yang,et al.  Accelerating solvers for global atmospheric equations through mixed-precision data flow engine , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[38]  Zhenyu Wen,et al.  Cost Effective, Reliable, and Secure Workflow Deployment over Federated Clouds , 2015, 2015 IEEE 8th International Conference on Cloud Computing.

[39]  Ishfaq Ahmad,et al.  Analysis, evaluation, and comparison of algorithms for scheduling task graphs on parallel processors , 1996, Proceedings Second International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'96).

[40]  Shahaboddin Shamshirband,et al.  TETS: A Genetic-Based Scheduler in Cloud Computing to Decrease Energy and Makespan , 2016, HIS.

[41]  Pierluigi Crescenzi,et al.  Introduction to the theory of complexity , 1994, Prentice Hall international series in computer science.

[42]  Hong He,et al.  Task assignment in heterogeneous computing systems using an effective iterated greedy algorithm , 2011, J. Syst. Softw..

[43]  Song Huang,et al.  On the energy efficiency of graphics processing units for scientific computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[44]  James C. Hoe,et al.  Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[45]  Selim G. Akl,et al.  Scheduling Algorithms for Grid Computing: State of the Art and Open Problems , 2006 .

[46]  Steven M. LaValle,et al.  Efficient formation path planning on large graphs , 2013, 2013 IEEE International Conference on Robotics and Automation.

[47]  Kenli Li,et al.  A genetic algorithm for task scheduling on heterogeneous computing systems using multiple priority queues , 2014, Inf. Sci..

[48]  Bertil Schmidt,et al.  Next-generation sequencing: big data meets high performance computing. , 2017, Drug discovery today.