Efficient Design Space Exploration of GPGPU Architectures

The goal of this work is to revisit GPU design and introduce a fast, low-cost and effective approach to optimize resource allocation in future GPUs. We have achieved this goal by using the Plackett-Burman methodology to explore the design space efficiently. We further formulate the design exploration problem as that of a constraint optimization. Our approach produces the optimum configuration in 84% of the cases, and in case that it does not, it produces the second optimal case with a performance penalty of less than 3.5%. Also, our method reduces the number of explorations one needs to perform by as much as 78%.

[1]  Theo Ungerer,et al.  Transistor count and chip-space estimation of simplescalar-based microprocessor model , 2001 .

[2]  H. Corporaal,et al.  Fast Multi-Dimension Multi-Choice Knapsack Heuristic for MP-SoC Run-Time Management , 2006, 2006 International Symposium on System-on-Chip.

[3]  William J. Dally,et al.  The GPU Computing Era , 2010, IEEE Micro.

[4]  Paolo Toth,et al.  Knapsack Problems: Algorithms and Computer Implementations , 1990 .

[5]  Jia,et al.  [IEEE 2012 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS) - New Brunswick, NJ, USA (2012.04.1-2012.04.3)] 2012 IEEE International Symposium on Performance Analysis of Systems & Software - Stargazer: Automated regression-based GPU design space exploration , 2012 .

[6]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[7]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[8]  Margaret Martonosi,et al.  Stargazer: Automated regression-based GPU design space exploration , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[9]  Vittorio Zaccaria,et al.  ReSPIR: A Response Surface-Based Pareto Iterative Refinement for Application-Specific Design Space Exploration , 2009, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[10]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[11]  Tor M. Aamodt,et al.  Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[12]  Douglas M. Hawkins,et al.  Improving computer architecture simulation methodology by adding statistical rigor , 2005, IEEE Transactions on Computers.

[13]  R. Plackett,et al.  THE DESIGN OF OPTIMUM MULTIFACTORIAL EXPERIMENTS , 1946 .

[14]  Nikitas J. Dimopoulos,et al.  A new heuristic for solving the multichoice multidimensional knapsack problem , 2005, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.