Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking
暂无分享,去创建一个
Daniele Cesarini | Giuseppe Tagliavini | Andrea Marongiu | G. Tagliavini | Daniel Cesarini | A. Marongiu | Giuseppe Tagliavini
[1] Chris D. Marlin. Coroutines: A Programming Methodology, a Language Design and an Implementation , 1980, Lecture Notes in Computer Science.
[2] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[3] Eduard Ayguadé,et al. Nanos mercurium: A research compiler for OpenMP , 2004 .
[4] D. Novillo. OpenMP and automatic parallelization in GCC Diego , 2006 .
[5] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[6] Christopher J. Hughes,et al. Carbon: architectural support for fine-grained parallelism on chip multiprocessors , 2007, ISCA '07.
[7] Muhammad Shafique,et al. RISPP: Rotating Instruction Set Processing Platform , 2007, 2007 44th ACM/IEEE Design Automation Conference.
[8] Alejandro Duran,et al. An adaptive cut-off for task parallelism , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Alejandro Duran,et al. Evaluation of OpenMP Task Scheduling Strategies , 2008, IWOMP.
[10] Barbara M. Chapman,et al. Implementing OpenMP on a high performance embedded multicore MPSoC , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[11] Karl-Filip Faxén,et al. Wool-A work stealing library , 2008, CARN.
[12] Alejandro Duran,et al. Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP , 2009, 2009 International Conference on Parallel Processing.
[13] Charles E. Leiserson,et al. The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[14] Alejandro Duran,et al. The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.
[15] Yi Guo,et al. Work-first and help-first scheduling policies for async-finish task parallelism , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[16] Muhammad Shafique,et al. KAHRISMA: A Novel Hypermorphic Reconfigurable-Instruction-Set Multi-grained-Array Architecture , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).
[17] Spiros N. Agathos,et al. Design and Implementation of OpenMP Tasks in the OMPi Compiler , 2011, 2011 15th Panhellenic Conference on Informatics.
[18] Kazuki Sakamoto,et al. Grand Central Dispatch , 2012 .
[19] Jörg Henkel,et al. Invasive manycore architectures , 2012, 17th Asia and South Pacific Design Automation Conference.
[20] Luca Benini,et al. Platform 2012, a many-core computing accelerator for embedded SoCs: Performance evaluation of visual analytics applications , 2012, DAC Design Automation Conference 2012.
[21] Spiros N. Agathos,et al. Deploying OpenMP on an embedded multicore accelerator , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).
[22] Cheng Wang,et al. libEOMP: a portable OpenMP runtime library based on MCA APIs for embedded systems , 2013, PMAM '13.
[23] Luca Benini,et al. Enabling fine-grained OpenMP tasking on tightly-coupled shared memory clusters , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[24] Alistair P. Rendell,et al. OpenMP on the Low-Power TI Keystone II ARM/DSP System-on-Chip , 2013, IWOMP.
[25] Alistair P. Rendell,et al. Implementation and Optimization of the OpenMP Accelerator Model for the TI Keystone II Architecture , 2014, IWOMP.
[26] Eduardo Quiñones,et al. P-SOCRATES: A Parallel Software Framework for Time-Critical Many-Core Systems , 2014, 2014 17th Euromicro Conference on Digital System Design.
[27] Luca Benini,et al. Architecture Support for Tightly-Coupled Multi-Core Clusters with Shared-Memory HW Accelerators , 2015, IEEE Transactions on Computers.
[28] Mats Brorsson,et al. A comparative performance study of common and popular task‐centric programming frameworks , 2015, Concurr. Comput. Pract. Exp..
[29] Luca Benini,et al. Simplifying Many-Core-Based Heterogeneous SoC Programming With Offload Directives , 2015, IEEE Transactions on Industrial Informatics.
[30] Eduardo Quiñones,et al. Timing characterization of OpenMP4 tasking model , 2015, 2015 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES).
[31] Eduardo Quiñones,et al. OpenMP and timing predictability: A possible union? , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[32] Indrani Paul,et al. Achieving Exascale Capabilities through Heterogeneous Computing , 2015, IEEE Micro.
[33] Sunita Chandrasekaran,et al. Exploring Task Parallelism for Heterogeneous Systems Using Multicore Task Management API , 2016, Euro-Par Workshops.
[34] Sven Karlsson,et al. Towards Unifying OpenMP Under the Task-Parallel Paradigm - Implementation and Performance of the taskloop Construct , 2016, IWOMP.
[35] VirtualSoC: A Research Tool for Modern MPSoCs , 2016, ACM Trans. Embed. Comput. Syst..
[36] Soonwook Hwang,et al. Resource Allocation Policies for Loosely Coupled Applications in Heterogeneous Computing Systems , 2016, IEEE Transactions on Parallel and Distributed Systems.
[37] Eduardo Quiñones,et al. Response-time analysis of DAG tasks under fixed priority scheduling with limited preemptions , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[38] Maria A. Serrano,et al. A lightweight OpenMP4 run-time for embedded systems , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).
[39] Jie Shen,et al. Workload Partitioning for Accelerating Applications on Heterogeneous Platforms , 2016, IEEE Transactions on Parallel and Distributed Systems.
[40] Luca Benini,et al. Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs , 2017, IEEE Transactions on Parallel and Distributed Systems.
[41] Emmanuel Agullo,et al. Bridging the Gap Between OpenMP and Task-Based Runtime Systems for the Fast Multipole Method , 2017, IEEE Transactions on Parallel and Distributed Systems.
[42] Torsten Hoefler,et al. Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications , 2017, PPoPP.
[43] Jörg Henkel,et al. Timing Analysis of Tasks on Runtime Reconfigurable Processors , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[44] Eduardo Quiñones,et al. An Analysis of Lazy and Eager Limited Preemption Approaches under DAG-Based Global Fixed Priority Scheduling , 2017, 2017 IEEE 20th International Symposium on Real-Time Distributed Computing (ISORC).