Estimating the WCET of GPU-Accelerated Applications Using Hybrid Analysis

The massive parallelism offered by Graphics Processing Units (GPUs) is now routinely exploited to accelerate computationally intensive tasks in a wide variety of application domains. Efficient GPU programming in languages such as CUDA and OpenCL requires careful application of hand optimisations to exploit parallelism and locality while minimising synchronisation. The effectiveness of such optimisations can be highly dependent on workload and the structure of input data, making it difficult to assess performance in general by testing alone. To address this, we study the problem of estimating the Worst-Case Execution Time (WCET) of GPU-accelerated applications. We propose the use of hybrid WCET analysis whereby execution times of small program segments are deduced from traces of execution and a calculation backend derived from the Control Flow Graph (CFG) produces a WCET estimate. Standard techniques which construct a CFG from a binary cannot be applied directly to GPU code because they miss implicit execution paths that arise due the way branches are implemented in hardware - we present a solution using standard compiler analysis. We further describe how to extend the basic hybrid WCET analysis of sequential code so that concurrent timing effects in the GPU execution model are incorporated. We have implemented our analysis as a tool built on top of the GPGPU-sim open source simulator. We evaluate our tool using a set of benchmarks drawn from the CUDA SDK: results show that effective modelling of concurrency is key to reducing pessimism in the WCET calculation.

[1]  Guillem Bernat,et al.  Tree-based WCET analysis on instrumentation point graphs , 2006, Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'06).

[2]  Peter P. Puschner,et al.  Computing Maximum Task Execution Times — A Graph-Based Approach , 1997, Real-Time Systems.

[3]  G. Ramalingam,et al.  On loops, dominators, and dominance frontier , 2000, PLDI '00.

[4]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[5]  Jakob Engblom,et al.  The worst-case execution-time problem—overview of methods and survey of tools , 2008, TECS.

[6]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[7]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[8]  Eric Darve,et al.  Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[9]  Joachim Wegener,et al.  A Comparison of Static Analysis and Evolutionary Testing for the Verification of Timing Constraints , 2004, Real-Time Systems.

[10]  Guillem Bernat,et al.  WCET analysis of probabilistic hard real-time systems , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..

[11]  Alan Burns,et al.  A survey of hard real-time scheduling for multiprocessor systems , 2011, CSUR.

[12]  Henrik Theiling,et al.  Control flow graphs for real-time systems analysis: reconstruction from binary executables and usage in ILP-based path analysis , 2002 .

[13]  Guillem Bernat,et al.  Identifying irreducible loops in the Instrumentation Point Graph , 2011, J. Syst. Archit..

[14]  Adam Betts Reducing the Size of the Constraint Model in Implicit Path Enumeration Using Super Blocks , 2012, 2012 IEEE 33rd Real-Time Systems Symposium.

[15]  Adam Betts,et al.  Hybrid measurement-based WCET analysis using instrumentation point graphs , 2008 .

[16]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[17]  Jan Gustafsson,et al.  Loop Bound Analysis based on a Combination of Program Slicing, Abstract Interpretation, and Invariant Analysis , 2007, WCET.

[18]  Stefan M. Petters Comparison of Trace Generation Methods for Measurement Based WCET Analysis , 2003, WCET.

[19]  Amine Marref,et al.  WCET Analysis of Component-Based Systems Using Timing Traces , 2011, 2011 16th IEEE International Conference on Engineering of Complex Computer Systems.

[20]  Robert E. Tarjan Testing flow graph reducibility , 1973, STOC '73.

[21]  G. Ramalingam,et al.  On loops, dominators, and dominance frontiers , 2002, TOPL.

[22]  Joachim Wegener,et al.  A Comparison of Static Analysis and Evolutionary Testing for the Verification of Timing Constraints , 1998, Proceedings. Fourth IEEE Real-Time Technology and Applications Symposium (Cat. No.98TB100245).

[23]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[24]  Björn Lisper Towards Parallel Programming Models for Predictability , 2012, WCET.

[25]  Stefan M. Petters,et al.  Experimental evaluation of code properties for WCET analysis , 2003, RTSS 2003. 24th IEEE Real-Time Systems Symposium, 2003.