A load balance multi-scheduling model for OpenCL kernel tasks in an integrated cluster

Nowadays, embedded systems are comprised of heterogeneous multi-core architectures, i.e., CPUs and GPUs. If the application is mapped to an appropriate processing core, then these architectures provide many performance benefits to applications. Typically, programmers map sequential applications to CPU and parallel applications to GPU. The task mapping becomes challenging because of the usage of evolving and complex CPU- and GPU-based architectures. This paper presents an approach to map the OpenCL application to heterogeneous multi-core architecture by determining the application suitability and processing capability. The classification is achieved by developing a machine learning-based device suitability classifier that predicts which processor has the highest computational compatibility to run OpenCL applications. In this paper, 20 distinct features are proposed that are extracted by using the developed LLVM-based static analyzer. In order to select the best subset of features, feature selection is performed by using both correlation analysis and the feature importance method. For the class imbalance problem, we use and compare synthetic minority over-sampling method with and without feature selection. Instead of hand-tuning the machine learning classifier, we use the tree-based pipeline optimization method to select the best classifier and its hyper-parameter. We then compare the optimized selected method with traditional algorithms, i.e., random forest, decision tree, Naïve Bayes and KNN. We apply our novel approach on extensively used OpenCL benchmarks, i.e., AMD and Polybench. The dataset contains 653 training and 277 testing applications. We test the classification results using four performance metrics, i.e., F -measure, precision, recall and $$R^2$$ R 2 . The optimized and reduced feature subset model achieved a high F -measure of 0.91 and $$R^2$$ R 2 of 0.76. The proposed framework automatically distributes the workload based on the application requirement and processor compatibility.

[1]  Ramón Beivide,et al.  Simplifying programming and load balancing of data parallel applications on heterogeneous systems , 2016, GPGPU@PPoPP.

[2]  Michael F. P. O'Boyle,et al.  A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL , 2011, CC.

[3]  Denis Barthou,et al.  Automatic OpenCL Task Adaptation for Heterogeneous Architectures , 2016, Euro-Par.

[4]  Hyesoon Kim,et al.  Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[5]  Ali Dehghantanha,et al.  Security Aspects of Internet of Things aided Smart Grids: a Bibliometric Survey , 2019, Internet Things.

[6]  Usman Ahmed,et al.  Suggestion Miner at SemEval-2019 Task 9: Suggestion Detection in Online Forum using Word Graph , 2019, SemEval@NAACL-HLT.

[7]  Michael F. P. O'Boyle,et al.  Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms , 2014, 2014 21st International Conference on High Performance Computing (HiPC).

[8]  Muhammad Aleem,et al.  Graph Centrality Based Spam SMS Detection , 2019, 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST).

[9]  Hasina Khatoon,et al.  Smart Scheduler for CUDA Programming in Heterogeneous CPU/GPU Environment , 2019, ICCMS 2019.

[10]  Muhammad Arshad Islam,et al.  Irony Detector at SemEval-2018 Task 3: Irony Detection in English Tweets using Word Graph , 2018, *SEMEVAL.

[11]  Michael F. P. O'Boyle,et al.  Merge or Separate?: Multi-job Scheduling for OpenCL Kernels on CPU/GPU Platforms , 2017, GPGPU@PPoPP.

[12]  Mahmoud Momtazpour,et al.  Machine Learning-based Interference Detection in GPGPU Concurrent Kernel Execution , 2020, 2020 25th International Computer Conference, Computer Society of Iran (CSICC).

[13]  Thomas Fahringer,et al.  An automatic input-sensitive approach for heterogeneous task partitioning , 2013, ICS '13.

[14]  Mikael Sjödin,et al.  Static Allocation of Parallel Tasks to Improve Schedulability in CPU-GPU Heterogeneous Real-Time Systems , 2019, IECON 2019 - 45th Annual Conference of the IEEE Industrial Electronics Society.

[15]  Jong-Myon Kim,et al.  An efficient scheduling scheme using estimated execution time for heterogeneous computing systems , 2013, The Journal of Supercomputing.

[16]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[17]  Pabitra Mitra,et al.  A Framework for OpenCL Task Scheduling on Heterogeneous Multicores , 2017, Parallel Process. Lett..

[18]  Yaser Jararweh,et al.  Data and Service Management in Densely Crowded Environments: Challenges, Opportunities, and Recent Developments , 2019, IEEE Communications Magazine.

[19]  Thar Baker,et al.  Analysis of Dimensionality Reduction Techniques on Big Data , 2020, IEEE Access.

[20]  Muhammad Arshad Islam,et al.  RALB‐HC: A resource‐aware load balancer for heterogeneous cluster , 2019 .

[21]  Radu Prodan,et al.  E-OSched: a load balancing scheduler for heterogeneous multicores , 2018, The Journal of Supercomputing.

[22]  Uwe Schwiegelshohn,et al.  Online Bi-Objective Scheduling for IaaS Clouds Ensuring Quality of Service , 2016, Journal of Grid Computing.

[23]  Michael F. P. O'Boyle,et al.  Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[24]  Laxmi N. Bhuyan,et al.  A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures , 2013, TACO.

[25]  Muhammad Tanvir Afzal,et al.  Pre-production box-office success quotient forecasting , 2019, Soft Comput..

[26]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[27]  Daniel J. Sorin,et al.  Exploring memory consistency for massively-threaded throughput-oriented processors , 2013, ISCA.

[28]  KimChangkyu,et al.  Debunking the 100X GPU vs. CPU myth , 2010 .

[29]  Ozcan Ozturk,et al.  Effective Kernel Mapping for OpenCL Applications in Heterogeneous Platforms , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[30]  Surendra Byna,et al.  Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory , 2010, SPAA '10.

[31]  Wei Jiang,et al.  Scheduling Concurrent Applications on a Cluster of CPU-GPU Nodes , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[32]  Moayad Aloqaily,et al.  Resource Efficient Allocation and RRH Placement for Backhaul of Moving Small Cells , 2019, IEEE Access.

[33]  Pabitra Mitra,et al.  Divergence Aware Automated Partitioning of OpenCL Workloads , 2016, ISEC.

[34]  Muhammad Arshad Islam,et al.  Troodon: A machine-learning based load-balancing application scheduler for CPU-GPU system , 2019, J. Parallel Distributed Comput..

[35]  Gautam Srivastava,et al.  Enhancing Network Security Via Machine Learning: Opportunities and Challenges , 2020, Handbook of Big Data Privacy.

[36]  Yaser Jararweh,et al.  A Power Management Approach to Reduce Energy Consumption for Edge Computing Servers , 2019, 2019 Fourth International Conference on Fog and Mobile Edge Computing (FMEC).