A New Recovery Mechanism in Superscalar Microprocessors by Recovering Critical Misprediction

Current trends in modern out-of-order processors involve implementing deeper pipelines and a large instruction window to achieve high performance, which lead to the penalty of the branch misprediction recovery being a critical factor in overall processor performance. Multi path execution is proposed to reduce this penalty by executing both paths following a branch, simultaneously. However, there are some drawbacks in this mechanism, such as design complexity caused by processing both paths after a branch and performance degradation due to hardware resource competition between two paths. In this paper, we propose a new recovery mechanism, called Recovery Critical Misprediction (RCM), to reduce the penalty of branch misprediction recovery. The mechanism uses a small trace cache to save the decoded instructions from the alternative path following a branch. Then, during the subsequent predictions, the trace cache is accessed. If there is a hit, the processor forks the second path of this branch at the renamed stage so that the design complexity in the fetch stage and decode stage is alleviated. The most contribution of this paper is that our proposed mechanism employs critical path prediction to identify the branches that will be most harmful if mispredicted. Only the critical branch can save its alternative path into the trace cache, which not only increases the usefulness of a limited size of trace cache but also avoids the performance degradation caused by the forked non-critical branch. Experimental results employing SPECint 2000 benchmark show that a processor with our proposed RCM improves IPC value by 10.05% compared with a conventional processor.

[1]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[2]  Shlomo Weiss,et al.  Hiding the misprediction penalty of a resource-efficient high-performance processor , 2008, TACO.

[3]  Dirk Grunwald,et al.  Selective eager execution on the PolyPath architecture , 1998, ISCA.

[4]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[5]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[6]  E. Smith,et al.  Selective Dual Path Execution , 1996 .

[7]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[8]  Haitham Akkary,et al.  An analysis of a resource efficient checkpoint architecture , 2004, TACO.

[9]  Avi Mendelson,et al.  Filtering techniques to improve trace-cache efficiency , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[10]  Peng Zhou,et al.  Fast branch misprediction recovery in out-of-order superscalar processors , 2005, ICS '05.

[11]  Eric Sprangle,et al.  Increasing processor performance by implementing deeper pipelines , 2002, ISCA.

[12]  Daniel J. Pease,et al.  Trace Cache performance parameters , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[13]  Shai Rubin,et al.  Focusing processor policies via critical-path prediction , 2001, ISCA 2001.

[14]  Brad Calder,et al.  Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[15]  Mateo Valero,et al.  Trace cache redundancy: red and blue traces , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[16]  José González,et al.  Dual path instruction processing , 2002, ICS '02.

[17]  Eric Rotenberg,et al.  Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.