Programmability and portability for exascale: Top down programming methodology and tools with StarSs

Abstract StarSs is a task-based programming model that allows to parallelize sequential applications by means of annotating the code with compiler directives. The model further supports transparent execution of designated tasks on heterogeneous platforms, including clusters of GPUs. This paper focuses on the methodology and tools that complements the programming model forming a consistent development environment with the objective of simplifying the live of application developers. The programming environment includes the tools TAREADOR and TEMANEJO, which have been designed specifically for StarSs. TAREADOR, a Valgrind-based tool, allows a top-down development approach by assisting the programmer in identifying tasks and their data-dependencies across all concurrency levels of an application. TEMANEJO is a graphical debugger supporting the programmer by visualizing the task dependency tree on one hand, but also allowing to manipulate task scheduling or dependencies. These tools are complemented with a set of performance analysis tools (Scalasca, Cube and Paraver) that enable to fine tune StarSs application.

[1]  José Gracia,et al.  TEMANEJO - a debugger for task based parallel programming models , 2011, PARCO.

[2]  Jesús Labarta,et al.  DiP: A Parallel Program Development Environment , 1996, Euro-Par, Vol. II.

[3]  Bernd Mohr,et al.  The Scalasca performance toolset architecture , 2010 .

[4]  Lukas Arnold,et al.  Towards a petascale tree code: Scaling and efficiency of the PEPC library , 2011, J. Comput. Sci..

[5]  Jesús Labarta,et al.  CellSs: Making it easier to program the Cell Broadband Engine processor , 2007, IBM J. Res. Dev..

[6]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[7]  Eduard Ayguadé,et al.  An Extension of the StarSs Programming Model for Platforms with Multiple GPUs , 2009, Euro-Par.

[8]  Eduard Ayguadé,et al.  Effective communication and computation overlap with hybrid MPI/SMPSs , 2010, PPoPP '10.

[9]  Jesús Labarta,et al.  Handling task dependencies under strided and aliased references , 2010, ICS '10.

[10]  Jesús Labarta,et al.  A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.

[11]  Alejandro Duran,et al.  Productive Cluster Programming with OmpSs , 2011, Euro-Par.

[12]  Alejandro Duran,et al.  Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL , 2010, LCPC.

[13]  Mateo Valero,et al.  Quantifying the Potential Task-Based Dataflow Parallelism in MPI Applications , 2011, Euro-Par.