Multiobjective GPU design space exploration optimization

Abstract It has been more than a decade since general porous applications targeted GPUs to benefit from the enormous processing power they offer. However, not all applications gain speedup running on GPUs. If an application does not have enough parallel computation to hide memory latency, running it on a GPU will degrade the performance compared to what it could achieve on a CPU. On the other hand, the efficiency that an application with high level of parallelism can achieve running on a GPU depends on how well the application’s memory and computational demands are balanced with a GPU’s resources. In this work we tackle the problem of finding a GPU configuration that performs well on a set of GPGPU applications. To achieve this, we propose two models as follows. First, we study the design space of 20 GPGPU applications and show that the relationship between the architectural parameters of the GPU and the power and performance of the application it runs can be learned by a Neural Network (NN). We propose application-specific NN-based predictors that train with 5% of the design space and predict the power and performance of the remaining 95% configurations (blind set). Although the models make accurate predictions, there exist few configurations that their power and performance are mispredicted. We propose a filtering heuristic that captures most of the predictions with large errors by marking only 5% of the configurations in the blind set as outliers. Using the models and the filtering heuristic, one will have the power and performance values for all configurations in the design space of an application. Searching the design space for a set of configurations that meet certain restrictions on the power and performance can be a tedious task as some applications have large design spaces. In the Second model, we propose to employ the Pareto Front multiobjective optimization technique to obtain a subset of the design space that run the application optimally in terms of power and performance. We show that the optimum configurations predicted by our model is very close to the actual optimum configurations. While this method gives the optimum configurations for each application, having a set of GPGPU applications, one may look for a configuration that performs well over all the applications. Therefore, we propose a method to find such a configuration with respect to different performance objectives.

[1]  Nikitas J. Dimopoulos,et al.  In-training and post-training generalization methods: The case of ppar — α and ppar — γ agonists , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[2]  Thomas H. Wonnacott,et al.  Introductory Statistics , 2007, Technometrics.

[3]  Nikitas J. Dimopoulos,et al.  An improved neural network ensemble model of Aldose Reductase inhibitory activity , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[4]  Lars Kai Hansen,et al.  Neural Network Ensembles , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Douglas M. Hawkins,et al.  Improving computer architecture simulation methodology by adding statistical rigor , 2005, IEEE Transactions on Computers.

[6]  Ali Karami,et al.  A Two-Tier Design Space Exploration Algorithm to Construct a GPU Performance Predictor , 2014, ARCS.

[7]  Muhammad Ali Ismail,et al.  Improving Computing Systems Automatic Multiobjective Optimization Through Meta-Optimization , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[8]  E. Grayver,et al.  Hardware accelerated simulation tool (HAST) , 2005, 2005 IEEE Aerospace Conference.

[9]  Bin Li,et al.  Statistical GPU power analysis using tree-based methods , 2011, 2011 International Green Computing Conference and Workshops.

[10]  S.A. Manavski,et al.  CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography , 2007, 2007 IEEE International Conference on Signal Processing and Communications.

[11]  Amirali Baniasadi,et al.  GPU design space exploration: NN-based models , 2015, 2015 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM).

[12]  Amirali Baniasadi,et al.  Efficient Design Space Exploration of GPGPU Architectures , 2012, Euro-Par Workshops.

[13]  Ashraf Salem,et al.  Accelerated cosimulation using reconfigurable computing , 2002, The 14th International Conference on Microelectronics,.

[14]  R. Plackett,et al.  THE DESIGN OF OPTIMUM MULTIFACTORIAL EXPERIMENTS , 1946 .

[15]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[16]  David R. Kaeli,et al.  Exploring the multiple-GPU design space , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[17]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[18]  Satoshi Matsuoka,et al.  Statistical power modeling of GPU kernels using performance counters , 2010, International Conference on Green Computing.

[19]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[20]  Nikitas Dimopoulos,et al.  Accelerating Neural Network Ensemble Learning Using Optimization and Quantum Annealing Techniques , 2017 .

[21]  Jeong-Gun Lee,et al.  Performance/Power Design Space Exploration and Analysis for GPU Based Software , 2013 .

[22]  N.J. Dimopoulos,et al.  Guided construction of training data set for neural networks , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[23]  Matei Ripeanu,et al.  StoreGPU: exploiting graphics processing units to accelerate distributed storage systems , 2008, HPDC '08.

[24]  Margaret Martonosi,et al.  Stargazer: Automated regression-based GPU design space exploration , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[25]  Amirali Baniasadi,et al.  MultiObjective GPU design space exploration optimization , 2016, 2016 International Conference on High Performance Computing & Simulation (HPCS).

[26]  Emmett Kilgariff,et al.  Fermi GF100 GPU Architecture , 2011, IEEE Micro.