Performance improvement with circuit-level speculation

Current superscalar microprocessors' performance depends on its frequency and the number of useful instructions that can be processed per cycle (IPC). In this paper we propose a method called approximation to reduce the logic delay of a pipe-stage. The basic idea of approximation is to implement the logic function partially instead of fully. Most of the time the partial implementation gives the correct result as if the function is implemented fully but with fewer gates delay allowing a higher pipeline frequency. We apply this method on three logic blocks. Simulation results show that this method provides some performance improvement for a wide-issue superscalar if these stages are finely pipelined.

[1]  Mikko H. Lipasti,et al.  Exceeding the dataflow limit via value prediction , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[2]  Stamatis Vassiliadis,et al.  On the design complexity of the issue logic of superscalar machines , 1998, Proceedings. 24th EUROMICRO Conference (Cat. No.98EX204).

[3]  Todd M. Austin,et al.  DIVA: a reliable substrate for deep submicron microarchitecture design , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[4]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[5]  George D. Gristede,et al.  Implementation of a self-resetting CMOS 64-bit parallel adder with enhanced testability , 1999, IEEE J. Solid State Circuits.

[6]  James E. Smith,et al.  The microarchitecture of superscalar processors , 1995, Proc. IEEE.

[7]  H. Sanchez,et al.  A 200 MHz 2.5 V 4 W superscalar RISC microprocessor , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[8]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[9]  Antonio González,et al.  The synergy of multithreading and access/execute decoupling , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[10]  Mary Jane Irwin,et al.  Area-time-power tradeoffs in parallel adders , 1996 .

[11]  Stéphan Jourdan,et al.  Exploring instruction-fetch bandwidth requirement in wide-issue superscalar processors , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[12]  Yooichi Shintani,et al.  A Performance and Cost Analysis of Applying Superscalar Method to Mainframe Computers , 1995, IEEE Trans. Computers.

[13]  Sangyeun Cho,et al.  Decoupling local variable accesses in a wide-issue superscalar processor , 1999, ISCA.

[14]  Brad Calder,et al.  Instruction recycling on a multiple-path processor , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[15]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[16]  Roland A. Bechade,et al.  A 32b 66 MHz 1.8 W microprocessor , 1994, Proceedings of IEEE International Solid-State Circuits Conference - ISSCC '94.

[17]  Earl E. Swartzlander,et al.  The redundant cell adder , 1991, [1991] Proceedings 10th IEEE Symposium on Computer Arithmetic.

[18]  Mike Johnson,et al.  Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[19]  S SohiGurindar Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers , 1990 .

[20]  Marc Tremblay,et al.  A 64-b microprocessor with multimedia support , 1995 .

[21]  Gurindar S. Sohi,et al.  An empirical analysis of instruction repetition , 1998, ASPLOS VIII.

[22]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[23]  Norman P. Jouppi,et al.  Quantifying the Complexity of Superscalar Processors , 2002 .

[24]  Gurindar S. Sohi,et al.  Understanding the differences between value prediction and instruction reuse , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[25]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[26]  Antonio González,et al.  Value prediction for speculative multithreaded architectures , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[27]  Gurindar S. Sohi,et al.  The use of multithreading for exception handling , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[28]  Burton M. Leary,et al.  A 200 MHz 64 b dual-issue CMOS microprocessor , 1992, 1992 IEEE International Solid-State Circuits Conference Digest of Technical Papers.

[29]  Manoj Franklin,et al.  Control Flow Prediction Schemes for Wide-Issue Superscalar Processors , 1999, IEEE Trans. Parallel Distributed Syst..

[30]  Fischer Issue Logic For A 600 MHz Out-of-order Execution , 1997, Symposium 1997 on VLSI Circuits.

[31]  Daniel H. Friendly,et al.  Evaluation of Design Options for the Trace Cache Fetch Mechanism , 1999, IEEE Trans. Computers.