TaskGenX: A Hardware-Software Proposal for Accelerating Task Parallelism

As chip multi-processors (CMPs) are becoming more and more complex, software solutions such as parallel programming models are attracting a lot of attention. Task-based parallel programming models offer an appealing approach to utilize complex CMPs. However, the increasing number of cores on modern CMPs is pushing research towards the use of fine grained parallelism. Task-based programming models need to be able to handle such workloads and offer performance and scalability. Using specialized hardware for boosting performance of task-based programming models is a common practice in the research community.

[1]  Ben H. H. Juurlink,et al.  Nexus#: A Distributed Hardware Task Manager for Task-Based Programming Models , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[2]  K. Steinhubl Design of Ion-Implanted MOSFET'S with Very Small Physical Dimensions , 1974 .

[3]  Christoforos E. Kozyrakis,et al.  Flexible architectural support for fine-grain scheduling , 2010, ASPLOS XV.

[4]  Eduard Ayguadé,et al.  PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite , 2016, ACM Trans. Archit. Code Optim..

[5]  Dionisios N. Pnevmatikatos,et al.  Prefetching and cache management using task lifetimes , 2013, ICS '13.

[6]  Mateo Valero,et al.  On the simulation of large-scale architectures using multiple application abstraction levels , 2012, TACO.

[7]  Alexander Aiken,et al.  Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Eduard Ayguadé,et al.  Criticality-Aware Dynamic Task Scheduling for Heterogeneous Architectures , 2015, ICS.

[9]  Christos Kozyrakis,et al.  Flexible architectural support for fine-grain scheduling , 2010, ASPLOS 2010.

[10]  Alejandro Duran,et al.  Extending OpenMP to Survive the Heterogeneous Multi-Core Era , 2010, International Journal of Parallel Programming.

[11]  Barbara M. Chapman The Multicore Programming Challenge , 2007, APPT.

[12]  Eduard Ayguadé,et al.  General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[13]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[14]  Per Stenström,et al.  Runtime-Guided Cache Coherence Optimizations in Multi-core Architectures , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[15]  Magnus Själander,et al.  A Look-Ahead Task Management Unit for Embedded Multi-Core Architectures , 2008, 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools.

[16]  Mateo Valero,et al.  Architectural Support for Task Dependence Management with Flexible Software Scheduling , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[17]  Alejandro Duran,et al.  Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..

[18]  Alejandro Duran,et al.  Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[19]  Christopher J. Hughes,et al.  Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.

[20]  Eduard Ayguadé,et al.  MUSA: A Multi-level Simulation Approach for Next-Generation HPC Machines , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Dimitrios S. Nikolopoulos,et al.  A Unified Scheduler for Recursive and Task Dataflow Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.

[22]  Cong Yan,et al.  A scalable architecture for ordered parallelism , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[24]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[25]  Eduard Ayguadé,et al.  Task Superscalar: An Out-of-Order Task Pipeline , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.