Automated software testing of memory performance in embedded GPUs

Embedded and real-time software is often constrained by several temporal requirements. Therefore, it is important to design embedded software that meets the required performance goal. The inception of embedded graphics processing units (GPUs) brings fresh hope in developing high-performance embedded software which were previously not suitable for embedded platforms. Whereas GPUs use massive parallelism to obtain high throughput, the overall performance of an application running on embedded GPUs is often limited by memory performance. Therefore, a crucial problem lies in automatically detecting the inefficiency of such software developed for embedded GPUs. In this paper, we propose GUPT, a novel test generation framework that systematically explores and detects poor memory performance of applications running on embedded GPUs. In particular, we systematically combine static analysis with dynamic test generation to expose likely execution scenarios with poor memory performance. Each test case in our generated test suite reports a potential memory-performance issue, along with the detailed information to reproduce the same. We have implemented our test generation framework using GPGPU-Sim, a cycle-accurate simulator and the LLVM compiler infrastructure. We have evaluated our framework for several open-source programs. Our experiments suggest the efficacy of our framework by exposing numerous memory-performance issues in a reasonable time. We also show the usage of our framework in improving the performance of programs for embedded GPUs.

[1]  Mattan Erez,et al.  A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC , 2012, DAC Design Automation Conference 2012.

[2]  Joseph Devietti,et al.  GPUDet: a deterministic GPU architecture , 2013, ASPLOS '13.

[3]  Abhik Roychoudhury,et al.  Unified Cache Modeling for WCET Analysis and Layout Optimizations , 2009, 2009 30th IEEE Real-Time Systems Symposium.

[4]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[5]  Henrik Theiling,et al.  Fast and Precise WCET Prediction by Separated Cache and Path Analyses , 2000, Real-Time Systems.

[6]  Matthias Hauswirth,et al.  Algorithmic profiling , 2012, PLDI.

[7]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[8]  Camil Demetrescu,et al.  Input-Sensitive Profiling , 2012, IEEE Transactions on Software Engineering.

[9]  James R. Larus,et al.  Efficient path profiling , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[10]  Mahmut T. Kandemir,et al.  OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance , 2013, ASPLOS '13.

[11]  Sebastian Burckhardt,et al.  Multicore acceleration of priority-based schedulers for concurrency bug detection , 2012, PLDI.

[12]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[13]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[14]  J. Larus Whole program paths , 1999, PLDI '99.

[15]  Alexandre Termier,et al.  Scalability bottlenecks discovery in MPSoC platforms using data mining on simulation traces , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[16]  Abhik Roychoudhury,et al.  Static Analysis Driven Cache Performance Testing , 2013, 2013 IEEE 34th Real-Time Systems Symposium.

[17]  Shan Lu,et al.  Efficient concurrency-bug detection across inputs , 2013, OOPSLA.

[18]  Peng Li,et al.  GKLEE: concolic verification and test generation for GPUs , 2012, PPoPP '12.

[19]  Petru Eles,et al.  General purpose computing on low-power embedded GPUs: Has it come of age? , 2013, 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[20]  William Gropp,et al.  An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.