GPU Performance Estimation using Software Rasterization and Machine Learning

This paper introduces a predictive modeling framework to estimate the performance of GPUs during pre-silicon design. Early-stage performance prediction is useful when simulation times impede development by rendering driver performance validation, API conformance testing and design space explorations infeasible. Our approach builds a Random Forest regression model to analyze DirectX 3D workload behavior when executed by a software rasterizer, which we have extended with a workload characterizer to collect further performance information via program counters. In addition to regression models, this work produces detailed feature rankings which can provide valuable architectural insight, and accurate performance estimates for an Intel integrated Skylake generation GPU. Our models achieve reasonable out-of-sample-error rates of 14%, with an average simulation speedup of 327x.

[1]  Xiaojin Zhu,et al.  Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Derek Chiou,et al.  GPGPU performance and power estimation using machine learning , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).

[3]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[4]  Hai Jin,et al.  Accelerating GPGPU architecture simulation , 2013, SIGMETRICS '13.

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Shuaiwen Song,et al.  A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[7]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[8]  Carlos González,et al.  ATTILA: a cycle-level execution-driven simulator for modern GPU architectures , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[9]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  Bin Li,et al.  Performance and Power Analysis of ATI GPU: A Statistical Approach , 2011, 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage.

[13]  Andreas Gerstlauer,et al.  Accurate phase-level cross-platform power and performance estimation , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[14]  Martin Schulz,et al.  Adaptive Configuration Selection for Power-Constrained Heterogeneous Systems , 2014, 2014 43rd International Conference on Parallel Processing.

[15]  David R. Kaeli,et al.  Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Franz Franchetti,et al.  Accelerating Architectural Simulation Via Statistical Techniques: A Survey , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[18]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[19]  Bin Li,et al.  Tree structured analysis on GPU power study , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[20]  Dam Sunwoo,et al.  FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators , 2007, MICRO.

[21]  George Kurian,et al.  Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[22]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[23]  Stijn Eyerman,et al.  Interval simulation: Raising the level of abstraction in architectural simulation , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[24]  H. Akaike A new look at the statistical model identification , 1974 .

[25]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[26]  Xiaohan Ma,et al.  Statistical Power Consumption Analysis and Modeling for GPU-based Computing , 2011 .

[27]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[28]  Wolfgang Rosenstiel,et al.  Source level performance simulation of GPU cores , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[29]  Sally A. McKee,et al.  Efficiently exploring architectural design spaces via predictive modeling , 2006, ASPLOS XII.