An Instruction Throughput Model of Superscalar Processors

Advances in semiconductor technology enable larger processor design spaces, leading to increasingly complex systems. At an initial stage, designers must evaluate many architecture design points to achieve a suitable design. Currently, most architecture exploration is performed using cycle accurate simulators. Although accurate, these tools are slow, thus limiting a comprehensive design search. The vast design space of today's complex processors and time to market economic pressures motivate the need for faster architectural evaluation methods. This paper presents a superscalar processor performance model that enables rapid exploration of the architecture design space for superscalar processors. It supplements current design tools by quickly identifying promising areas for more thorough and time consuming exploration with traditional tools. The model estimates the instruction throughput of a superscalar processor based on early architectural design parameters and application properties. It has been validated with the SimpleScalar out-of-order simulator. The core of the model, which executes 1.6 million times faster, produces instruction throughput estimates that are with within 5.5 percent of the corresponding SimpleScalar values.

[1]  Timothy Sherwood,et al.  Wavelet-based phase classification , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[2]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[3]  Lieven Eeckhout,et al.  Accurate memory data flow modeling in statistical simulation , 2006, ICS '06.

[4]  Norman P. Jouppi,et al.  The Nonuniform Distribution of Instruction-Level and Machine Parallelism and Its Effect on Performance , 1989, IEEE Trans. Computers.

[5]  Tarek M. Taha,et al.  A parallelism, instruction throughput, and cycle time model of computer architectures , 2002 .

[6]  Kapil Vaswani,et al.  A Predictive Performance Model for Superscalar Processors , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[7]  David J. Lilja,et al.  Simulation of computer architectures: simulators, benchmarks, methodologies, and recommendations , 2006, IEEE Transactions on Computers.

[8]  Brad Calder,et al.  Using Machine Learning to Guide Architecture Simulation , 2006, J. Mach. Learn. Res..

[9]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[10]  R. Nagarajan,et al.  A design space evaluation of grid processor architectures , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[11]  Stijn Eyerman,et al.  Efficient Design Space Exploration of High Performance Embedded Out-of-Order Processors , 2006, Proceedings of the Design Automation & Test in Europe Conference.

[12]  Todd M. Austin,et al.  Performance Simulation Tools , 2002, Computer.

[13]  Erik Hagersten,et al.  StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.

[14]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[15]  Thomas F. Wenisch,et al.  SimFlex: Statistical Sampling of Computer System Simulation , 2006, IEEE Micro.

[16]  Dean M. Tullsen,et al.  Compiling for instruction cache performance on a multithreaded architecture , 2002, MICRO.

[17]  Stéphan Jourdan,et al.  Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[18]  Chen Ding,et al.  Miss Rate Prediction Across Program Inputs and Cache Configurations , 2007, IEEE Transactions on Computers.

[19]  Karthikeyan Sankaralingam,et al.  A design space evaluation of grid processor architectures , 2001, MICRO.

[20]  Yale N. Patt,et al.  One Billion Transistors, One Uniprocessor, One Chip , 1997, Computer.

[21]  Yongxin Zhu,et al.  Modeling architectural improvements in superscalar processors , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[22]  Erik Hagersten,et al.  A statistical multiprocessor cache model , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[23]  Sang Bang Choi,et al.  The Effect of Instruction Window on the Performance of Superscalar Processors(Special Section of Papers Selected from ITC-CSCC'97) , 1998 .

[24]  Michael Gschwind,et al.  Integrated analysis of power and performance for pipelined microprocessors , 2004, IEEE Transactions on Computers.

[25]  Lieven Eeckhout,et al.  Control flow modeling in statistical simulation for accurate and efficient processor design studies , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[26]  S SohiGurindar Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , 1990 .

[27]  Sang Bang Choi,et al.  System Performance Analyses of Out-of-Order Superscalar Processors Using Analytical Method (Special Section of Papers Selected from ITC-CSCC '98) , 1999 .

[28]  Jingling Xue,et al.  Efficient and accurate analytical modeling of whole-program data cache behavior , 2004, IEEE Transactions on Computers.

[29]  D. Burger,et al.  Billion-Transistor Architectures , 1997, Computer.

[30]  L. Eeckhout,et al.  Increasing the accuracy of statistical simulation for modeling superscalar processors , 2001, Conference Proceedings of the 2001 IEEE International Performance, Computing, and Communications Conference (Cat. No.01CH37210).

[31]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[32]  M. J. Serrano Performance estimation in a simultaneous multithreading processor , 1996, Proceedings of MASCOTS '96 - 4th International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[33]  Douglas M. Hawkins,et al.  Improving computer architecture simulation methodology by adding statistical rigor , 2005, IEEE Transactions on Computers.

[34]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[35]  Frederic T. Chong,et al.  HLS: combining statistical and symbolic simulation to guide microprocessor designs , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[36]  Tarek M. Taha,et al.  An Instruction Throughput Model of Superscalar Processors , 2008, IEEE Trans. Computers.

[37]  Doug Burger,et al.  Measuring Experimental Error in Microprocessor Simulation , 2001, ISCA 2001.

[38]  James E. Smith,et al.  A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[39]  Allan Tzeng,et al.  UltraSPARC-II/: expanding the boundaries of a system on a chip , 1998, IEEE Micro.

[40]  John Paul Shen,et al.  A framework for statistical modeling of superscalar processor performance , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[41]  James E. Smith,et al.  Statistical Simulation: Adding Efficiency to the Computer Designer's Toolbox , 2003, IEEE Micro.

[42]  James E. Smith,et al.  Statistical simulation of symmetric multiprocessor systems , 2002, Proceedings 35th Annual Simulation Symposium. SS 2002.

[43]  John Paul Shen,et al.  Theoretical modeling of superscalar processor performance , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[44]  Brad Calder,et al.  SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.

[45]  Michael D. Smith,et al.  Geust Editorial: Media processing: a new design target , 1996, IEEE Micro.

[46]  Daniel J. Pease,et al.  An analytical model for trace cache instruction fetch performance , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[47]  Kapil Vaswani,et al.  Construction and use of linear regression models for processor performance analysis , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[48]  Sally A. McKee,et al.  Efficiently exploring architectural design spaces via predictive modeling , 2006, ASPLOS XII.

[49]  Michael J. Flynn,et al.  Instruction Window Size Trade-Offs and Characterization of Program Parallelism , 1994, IEEE Trans. Computers.