DuctTeip : A task-based parallel programming framework for distributed memory architectures

DuctTeip : A task-based parallel programming framework for distributed memory architectures

[1]  Jack J. Dongarra,et al.  Implementing Linear Algebra Routines on Multi-core Processors with Pipelining and a Look Ahead , 2006, PARA.

[2]  Cédric Augonnet,et al.  StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators , 2012, EuroMPI.

[3]  Katherine A. Yelick,et al.  Hybrid PGAS runtime support for multicore nodes , 2010, PGAS '10.

[4]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[5]  David Padua,et al.  Encyclopedia of Parallel Computing , 2011 .

[6]  Thomas Hérault,et al.  PaRSEC: Exploiting Heterogeneity to Enhance Scalability , 2013, Computing in Science & Engineering.

[7]  Pavol Bauer,et al.  Fast event-based epidemiological simulations on national scales , 2015, Int. J. High Perform. Comput. Appl..

[8]  Martin Tillenius,et al.  SuperGlue: A Shared Memory Framework Using Data Versioning for Dependency-Aware Task-Based Parallelization , 2015, SIAM J. Sci. Comput..

[9]  Elisabeth Larsson,et al.  Resource-Aware Task Scheduling , 2015, ACM Trans. Embed. Comput. Syst..

[10]  Thomas Hérault,et al.  Algorithm-based fault tolerance for dense matrix factorizations , 2012, PPoPP '12.

[11]  Elisabeth Larsson,et al.  Programming Models Based on Data Versioning for Dependency-aware Task-based Parallelisation , 2012, 2012 IEEE 15th International Conference on Computational Science and Engineering.

[12]  Thomas Hérault,et al.  Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy , 2015, ACM Trans. Parallel Comput..

[13]  David Black-Schaffer,et al.  Towards more efficient execution: a decoupled access-execute approach , 2013, ICS '13.

[14]  Thomas Hérault,et al.  DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[15]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[16]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[17]  George Almási PGAS (Partitioned Global Address Space) Languages , 2011, Encyclopedia of Parallel Computing.

[18]  Jesús Labarta,et al.  A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.

[19]  Jesús Labarta,et al.  ClusterSs: a task-based programming model for clusters , 2011, HPDC '11.

[20]  Guillaume Mercier,et al.  hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications , 2010, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing.

[21]  Sverker Holmgren,et al.  Dynamic Autotuning of Adaptive Fast Multipole Methods on Hybrid Multicore CPU and GPU Systems , 2013, SIAM J. Sci. Comput..

[22]  Emanuel H. Rubensson,et al.  Chunks and Tasks: A programming model for parallelization of dynamic algorithms , 2012, Parallel Comput..

[23]  Elisabeth Larsson,et al.  A scalable RBF-FD method for atmospheric flow , 2015, J. Comput. Phys..