论文信息 - Fine-Grained Communication-Aware Task Scheduling Approach for Acyclic and Cyclic Applications on MPSoCs

Fine-Grained Communication-Aware Task Scheduling Approach for Acyclic and Cyclic Applications on MPSoCs

Fine-grained task models can exploit parallelism to achieve high performance for multiprocessor system-on-chip (MPSoC). However, fine-grained models face the issues of high-communication overhead and difficult scheduling decisions, and the two challenges are inter-dependent. To address the issues, this paper gives a full analysis of the fine-grained communication optimization technique and communication pipeline, from both time and topology perspectives, and proposes a static fine-grained communication-aware task scheduling (FCATS) approach, which integrates scheduling with communication pipeline for acyclic and cyclic applications based on the fine-grained Simulink model. The approach contains search-based scheduling with high-quality solutions utilizing genetic algorithm-integer linear programming (GA-ILP) and hybrid GA-heuristic scheduling with short solving time to meet different demands for users. The experimental results with both synthetic and real-life benchmarks on the 4/8/16-CPU platform demonstrate the efficiency of the approach on performance improvements compared to previous works.

[1] Hamid Arabnejad,et al. List Scheduling Algorithm for Heterogeneous Systems by an Optimistic Cost Table , 2014, IEEE Transactions on Parallel and Distributed Systems.

[2] José Duato,et al. Cache-Hierarchy Contention-Aware Scheduling in CMPs , 2014, IEEE Transactions on Parallel and Distributed Systems.

[3] Thomas M. Conte,et al. Contech: Efficiently Generating Dynamic Task Graphs for Arbitrary Parallel Programs , 2015, TACO.

[4] Kai Huang,et al. ILP Based Multithreaded Code Generation for Simulink Model , 2014, IEICE Trans. Inf. Syst..

[5] Salim Hariri,et al. Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[6] Luigi Carro,et al. Reducing fine-grain communication overhead in multithread code generation for heterogeneous MPSoC , 2007, SCOPES '07.

[7] Deo Prakash Vidyarthi,et al. A novel hybrid PSO–GA meta-heuristic for scheduling of DAG with communication on multiprocessor systems , 2015, Engineering with Computers.

[8] Wei Zhang,et al. Thermal-Aware Task Mapping on Dynamically Reconfigurable Network-on-Chip Based Multiprocessor System-on-Chip , 2018, IEEE Transactions on Computers.

[9] David K. Lowenthal,et al. A comparative analysis of fine-grain threads packages , 2003, J. Parallel Distributed Comput..

[10] Yuankun Xue,et al. Scalable and realistic benchmark synthesis for efficient NoC performance evaluation: A complex network analysis approach , 2016, 2016 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[11] Nawwaf N. Kharma,et al. A high performance algorithm for static task scheduling in heterogeneous distributed computing systems , 2008, J. Parallel Distributed Comput..

[12] Amit Kumar Singh,et al. Mapping on multi/many-core systems: Survey of current and emerging trends , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[13] Pier Luca Lanzi,et al. Ant Colony Heuristic for Mapping and Scheduling Tasks and Communications on Heterogeneous Embedded Systems , 2010, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14] Hironori Kasahara,et al. A standard task graph set for fair evaluation of multiprocessor scheduling algorithms , 2002 .

[15] Chuan Wang,et al. A Hybrid Heuristic-Genetic Algorithm for Task Scheduling in Heterogeneous Multi-core System , 2012, ICA3PP.

[16] Ahmed Amine Jerraya,et al. Functional modeling techniques for efficient SW code generation of video codec applications , 2006, Asia and South Pacific Conference on Design Automation, 2006..

[17] Hidenori Nakazato,et al. Clustering-Based Task Scheduling in a Large Number of Heterogeneous Processors , 2016, IEEE Transactions on Parallel and Distributed Systems.

[18] Ali Shatnawi,et al. Static scheduling of directed acyclic data flow graphs onto multiprocessors using particle swarm optimization , 2013, Comput. Oper. Res..

[19] Kenli Li,et al. A resource-aware scheduling algorithm with reduced task duplication on heterogeneous computing systems , 2014, The Journal of Supercomputing.

[20] Eddy Caron,et al. Communication Aware task Placement for Workflow Scheduling on DaaS-Based Cloud , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[21] Alan Wagner,et al. An integrated fine-grain runtime system for MPI , 2014, Computing.

[22] Hamid Reza Naji,et al. A Clustering Algorithm for Communication-Aware Scheduling of Task Graphs on Multi-Core Reconfigurable Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[23] Laxmikant V. Kalé,et al. TRAM: Optimizing Fine-Grained Communication with Topological Routing and Aggregation of Messages , 2014, 2014 43rd International Conference on Parallel Processing.

[24] Oliver Sinnen,et al. ILP Formulations for Optimal Task Scheduling with Communication Delays on Parallel Systems , 2015, IEEE Transactions on Parallel and Distributed Systems.

[25] Luigi Carro,et al. Memory-efficient multithreaded code generation from Simulink for heterogeneous MPSoC , 2007, Des. Autom. Embed. Syst..

[26] Edwin Hsing-Mean Sha,et al. Application Mapping and Scheduling for Network-on-Chip-Based Multiprocessor System-on-Chip With Fine-Grain Communication Optimization , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[27] Vassilios V. Dimakopoulos,et al. Exploiting fine-grain thread parallelism on multicore architectures , 2009, Sci. Program..

[28] Wei-Mei Chen,et al. Task scheduling for grid computing systems using a genetic algorithm , 2014, The Journal of Supercomputing.

[29] Charles E. Leiserson,et al. The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[30] Charles E. Leiserson,et al. Retiming synchronous circuitry , 1988, Algorithmica.

[31] Kuo-Chan Huang,et al. Adaptive dual-criteria task group allocation for clustering-based multi-workflow scheduling on parallel computing platform , 2015, The Journal of Supercomputing.

[32] Wayne H. Wolf,et al. TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[33] Kenli Li,et al. A genetic algorithm for task scheduling on heterogeneous computing systems using multiple priority queues , 2014, Inf. Sci..

[34] Zili Shao,et al. Optimally Removing Intercore Communication Overhead for Streaming Applications on MPSoCs , 2013, IEEE Transactions on Computers.

[35] Arch D. Robison,et al. Intel® Threading Building Blocks (TBB) , 2011, Encyclopedia of Parallel Computing.

[36] Dakshina Dasari,et al. Time-Triggered Co-Scheduling of Computation and Communication with Jitter Requirements , 2017, IEEE Transactions on Computers.

[37] Kenli Li,et al. List scheduling with duplication for heterogeneous computing systems , 2010, J. Parallel Distributed Comput..

[38] Alexandra Fedorova,et al. Contention-Aware Scheduling on Multicore Systems , 2010, TOCS.

[39] Keshab K. Parhi,et al. Static Rate-Optimal Scheduling of Iterative Data-Flow Programs via Optimum Unfolding , 1991, IEEE Trans. Computers.

[40] Zili Shao,et al. Memory-Aware Task Scheduling with Communication Overhead Minimization for Streaming Applications on Bus-Based Multiprocessor System-on-Chips , 2014, IEEE Transactions on Parallel and Distributed Systems.

[41] Ahmed Amine Jerraya,et al. Communication Optimizations for Multithreaded Code Generation from Simulink Models , 2015, ACM Trans. Embed. Comput. Syst..