Fine-Grained Communication-Aware Task Scheduling Approach for Acyclic and Cyclic Applications on MPSoCs

Fine-grained task models can exploit parallelism to achieve high performance for multiprocessor system-on-chip (MPSoC). However, fine-grained models face the issues of high-communication overhead and difficult scheduling decisions, and the two challenges are inter-dependent. To address the issues, this paper gives a full analysis of the fine-grained communication optimization technique and communication pipeline, from both time and topology perspectives, and proposes a static fine-grained communication-aware task scheduling (FCATS) approach, which integrates scheduling with communication pipeline for acyclic and cyclic applications based on the fine-grained Simulink model. The approach contains search-based scheduling with high-quality solutions utilizing genetic algorithm-integer linear programming (GA-ILP) and hybrid GA-heuristic scheduling with short solving time to meet different demands for users. The experimental results with both synthetic and real-life benchmarks on the 4/8/16-CPU platform demonstrate the efficiency of the approach on performance improvements compared to previous works.

[1]  Hamid Arabnejad,et al.  List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table , 2014, IEEE Transactions on Parallel and Distributed Systems.

[2]  José Duato,et al.  Cache-Hierarchy Contention-Aware Scheduling in CMPs , 2014, IEEE Transactions on Parallel and Distributed Systems.

[3]  Thomas M. Conte,et al.  Contech: Efficiently Generating Dynamic Task Graphs for Arbitrary Parallel Programs , 2015, TACO.

[4]  Kai Huang,et al.  ILP Based Multithreaded Code Generation for Simulink Model , 2014, IEICE Trans. Inf. Syst..

[5]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[6]  Luigi Carro,et al.  Reducing fine-grain communication overhead in multithread code generation for heterogeneous MPSoC , 2007, SCOPES '07.

[7]  Deo Prakash Vidyarthi,et al.  A novel hybrid PSO–GA meta-heuristic for scheduling of DAG with communication on multiprocessor systems , 2015, Engineering with Computers.

[8]  Wei Zhang,et al.  Thermal-Aware Task Mapping on Dynamically Reconfigurable Network-on-Chip Based Multiprocessor System-on-Chip , 2018, IEEE Transactions on Computers.

[9]  David K. Lowenthal,et al.  A comparative analysis of fine-grain threads packages , 2003, J. Parallel Distributed Comput..

[10]  Yuankun Xue,et al.  Scalable and realistic benchmark synthesis for efficient NoC performance evaluation: A complex network analysis approach , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[11]  Nawwaf N. Kharma,et al.  A high performance algorithm for static task scheduling in heterogeneous distributed computing systems , 2008, J. Parallel Distributed Comput..

[12]  Amit Kumar Singh,et al.  Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[13]  Pier Luca Lanzi,et al.  Ant Colony Heuristic for Mapping and Scheduling Tasks and Communications on Heterogeneous Embedded Systems , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  Hironori Kasahara,et al.  A standard task graph set for fair evaluation of multiprocessor scheduling algorithms , 2002 .

[15]  Chuan Wang,et al.  A Hybrid Heuristic-Genetic Algorithm for Task Scheduling in Heterogeneous Multi-core System , 2012, ICA3PP.

[16]  Ahmed Amine Jerraya,et al.  Functional modeling techniques for efficient SW code generation of video codec applications , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[17]  Hidenori Nakazato,et al.  Clustering-Based Task Scheduling in a Large Number of Heterogeneous Processors , 2016, IEEE Transactions on Parallel and Distributed Systems.

[18]  Ali Shatnawi,et al.  Static scheduling of directed acyclic data flow graphs onto multiprocessors using particle swarm optimization , 2013, Comput. Oper. Res..

[19]  Kenli Li,et al.  A resource-aware scheduling algorithm with reduced task duplication on heterogeneous computing systems , 2014, The Journal of Supercomputing.

[20]  Eddy Caron,et al.  Communication Aware task Placement for Workflow Scheduling on DaaS-Based Cloud , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[21]  Alan Wagner,et al.  An integrated fine-grain runtime system for MPI , 2014, Computing.

[22]  Hamid Reza Naji,et al.  A Clustering Algorithm for Communication-Aware Scheduling of Task Graphs on Multi-Core Reconfigurable Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[23]  Laxmikant V. Kalé,et al.  TRAM: Optimizing Fine-Grained Communication with Topological Routing and Aggregation of Messages , 2014, 2014 43rd International Conference on Parallel Processing.

[24]  Oliver Sinnen,et al.  ILP Formulations for Optimal Task Scheduling with Communication Delays on Parallel Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[25]  Luigi Carro,et al.  Memory-efficient multithreaded code generation from Simulink for heterogeneous MPSoC , 2007, Des. Autom. Embed. Syst..

[26]  Edwin Hsing-Mean Sha,et al.  Application Mapping and Scheduling for Network-on-Chip-Based Multiprocessor System-on-Chip With Fine-Grain Communication Optimization , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[27]  Vassilios V. Dimakopoulos,et al.  Exploiting fine-grain thread parallelism on multicore architectures , 2009, Sci. Program..

[28]  Wei-Mei Chen,et al.  Task scheduling for grid computing systems using a genetic algorithm , 2014, The Journal of Supercomputing.

[29]  Charles E. Leiserson,et al.  The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[30]  Charles E. Leiserson,et al.  Retiming synchronous circuitry , 1988, Algorithmica.

[31]  Kuo-Chan Huang,et al.  Adaptive dual-criteria task group allocation for clustering-based multi-workflow scheduling on parallel computing platform , 2015, The Journal of Supercomputing.

[32]  Wayne H. Wolf,et al.  TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[33]  Kenli Li,et al.  A genetic algorithm for task scheduling on heterogeneous computing systems using multiple priority queues , 2014, Inf. Sci..

[34]  Zili Shao,et al.  Optimally Removing Intercore Communication Overhead for Streaming Applications on MPSoCs , 2013, IEEE Transactions on Computers.

[35]  Arch D. Robison,et al.  Intel® Threading Building Blocks (TBB) , 2011, Encyclopedia of Parallel Computing.

[36]  Dakshina Dasari,et al.  Time-Triggered Co-Scheduling of Computation and Communication with Jitter Requirements , 2017, IEEE Transactions on Computers.

[37]  Kenli Li,et al.  List scheduling with duplication for heterogeneous computing systems , 2010, J. Parallel Distributed Comput..

[38]  Alexandra Fedorova,et al.  Contention-Aware Scheduling on Multicore Systems , 2010, TOCS.

[39]  Keshab K. Parhi,et al.  Static Rate-Optimal Scheduling of Iterative Data-Flow Programs via Optimum Unfolding , 1991, IEEE Trans. Computers.

[40]  Zili Shao,et al.  Memory-Aware Task Scheduling with Communication Overhead Minimization for Streaming Applications on Bus-Based Multiprocessor System-on-Chips , 2014, IEEE Transactions on Parallel and Distributed Systems.

[41]  Ahmed Amine Jerraya,et al.  Communication Optimizations for Multithreaded Code Generation from Simulink Models , 2015, ACM Trans. Embed. Comput. Syst..