Targeting Heterogeneous Architectures via Macro Data Flow

We propose a data flow based run time system as an efficient tool for supporting execution of parallel code on heterogeneous architectures hosting both multicore CPUs and GPUs. We discuss how the proposed run time system may be the target of both structured parallel applications developed using algorithmic skeletons/parallel design patterns and also more "domain specific" programming models. Experimental results demonstrating the feasibility of the approach are presented.

[1]  Laxmikant V. Kalé,et al.  Static macro data flow: Compiling global control into local control , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[2]  Alejandro Duran,et al.  The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[3]  Paraskevas Evripidou,et al.  Programming multi-core architectures using Data-Flow techniques , 2010, 2010 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[4]  Mickaël Raulet,et al.  OpenDF: a dataflow toolset for reconfigurable hardware and multicore systems , 2008, CARN.

[5]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '75.

[6]  Timothy G. Mattson,et al.  Patterns for parallel programming , 2004 .

[7]  Peter Kilpatrick,et al.  Parallel Patterns + Macro Data Flow for Multi-core Programming , 2012, 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[8]  Susanna Pelagatti,et al.  Task and Data Parallelism in P3L , 2003, Patterns and Skeletons for Parallel and Distributed Computing.

[9]  Christoph W. Kessler,et al.  SkePU: a multi-backend skeleton programming library for multi-GPU systems , 2010, HLPP '10.

[10]  Sergei Gorlatch,et al.  SkelCL - A Portable Skeleton Library for High-Level GPU Programming , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[11]  Jack B. Dennis,et al.  A preliminary architecture for a basic data-flow processor , 1974, ISCA '98.

[12]  Peter Kilpatrick,et al.  Towards Hierarchical Management of Autonomic Components: A Case Study , 2009, 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing.

[13]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..

[14]  Herbert Kuchen,et al.  Data Parallel Skeletons for GPU Clusters and Multi-GPU Systems , 2011, PARCO.

[15]  Horacio González-Vélez,et al.  A survey of algorithmic skeleton frameworks: high‐level structured parallel programming enablers , 2010, Softw. Pract. Exp..

[16]  Jack J. Dongarra,et al.  Scheduling dense linear algebra operations on multicore processors , 2010, Concurr. Comput. Pract. Exp..

[17]  Ian Watson,et al.  The Manchester prototype dataflow computer , 1985, CACM.

[18]  Eduard Ayguadé,et al.  Hierarchical Task-Based Programming With StarSs , 2009, Int. J. High Perform. Comput. Appl..

[19]  Salvatore Orlando,et al.  P3 L: A structured high-level parallel language, and its structured support , 1995, Concurr. Pract. Exp..

[20]  Marco Danelutto Efficient Support for Skeletons on Workstation Clusters , 2001, Parallel Process. Lett..