Analyzing Dynamic Task-Based Applications on Hybrid Platforms: An Agile Scripting Approach

In this paper, we present visual analysis techniques to evaluate the performance of HPC task-based applications on hybrid architectures. Our approach is based on composing modern data analysis tools (pjdump, R, ggplot2, plotly), enabling an agile and flexible scripting framework with minor development cost. We validate our proposal by analyzing traces from the full-fledged implementation of the Cholesky decomposition available in the MORSE library running on a hybrid (CPU/GPU) platform. The analysis compares two different workloads and three different task schedulers from the StarPU runtime system. Our analysis based on composite views allows to identify allocation mistakes, priority problems in scheduling decisions, GPU tasks anomalies causing bad performance, and critical path issues.

[1]  Emmanuel Agullo,et al.  Task-Based Conjugate Gradient: From Multi-GPU Towards Heterogeneous Architectures , 2016, Euro-Par Workshops.

[2]  Ronald L. Graham,et al.  Bounds for certain multiprocessing anomalies , 1966 .

[3]  B. de Oliveira Stein,et al.  Pajé trace file format , 2003 .

[4]  Douglas Thain,et al.  DAGViz: a DAG visualization tool for analyzing task-parallel program traces , 2015, VPA '15.

[5]  Thomas Hérault,et al.  DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[6]  Generoso Pagano,et al.  FrameSoC Workbench: Facilitating Trace Analysis through a Consistent User Interface , 2014 .

[7]  Samuel Thibault,et al.  Implementation of FEM Application on GPU with StarPU , 2013, CSE 2013.

[8]  Cédric Augonnet,et al.  StarPU-MPI: Task Programming over Clusters of Machines Enhanced with Accelerators , 2012, EuroMPI.

[9]  Emmanuel Agullo,et al.  Multifrontal QR Factorization for Multicore Architectures over Runtime Systems , 2013, Euro-Par.

[10]  Jack J. Dongarra,et al.  Visualizing execution traces with task dependencies , 2015, VPA '15.

[11]  Matthias S. Müller,et al.  The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.

[12]  Lucas Mello Schnorr,et al.  Visualizing More Performance Data Than What Fits on Your Screen , 2012, Parallel Tools Workshop.

[13]  Jack Dongarra,et al.  Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects , 2009 .

[14]  José Gracia,et al.  Temanejo: Debugging of Thread-Based Task-Parallel Programs in StarSS , 2011, Parallel Tools Workshop.

[15]  Salim Hariri,et al.  Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[16]  Toni Cortes,et al.  PARAVER: A Tool to Visualize and Analyze Parallel Code , 2007 .

[17]  Raymond Namyst,et al.  An Efficient Multi-level Trace Toolkit for Multi-threaded Applications , 2005, Euro-Par.

[18]  Philippe Olivier Alexandre Navaux,et al.  Towards Seismic Wave Modeling on Heterogeneous Many-Core Architectures Using Task-Based Runtime System , 2015, 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[19]  George Bosilca,et al.  Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[20]  George Bosilca,et al.  Poster: Matrices over Runtime Systems at Exascale , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[21]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[22]  Dan Davison,et al.  A Multi-Language Computing Environment for Literate Programming and Reproducible Research , 2012 .

[23]  Emmanuel Agullo,et al.  Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[24]  Alejandro Duran,et al.  Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..