A study of control independence in superscalar processors

Control independence has been put forward as a significant new source of instruction level parallelism for future generation processors. However, its performance potential under practical hardware constraints is not known, and even less is understood about the factors that contribute to or limit the performance of control independence. Important aspects of control independence are identified and singled out for study, and a series of idealized machine models are used to isolate and evaluate these aspects. It is shown that much of the performance potential of control independence is lost due to data dependences and wasted resources consumed by incorrect control dependent instructions. Even so, control independence can close the performance gap between real and perfect branch prediction by as much as half. Next, important implementation issues are discussed and some design alternatives are given. This is followed by a more detailed set of simulations, where the key implementation features are realistically modeled. These simulations show typical performance improvements of 10-30%.

[1]  James E. Smith,et al.  Trace Processors: Moving to Fourth-Generation Microarchitectures , 1997, Computer.

[2]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[3]  E. Smith,et al.  Selective Dual Path Execution , 1996 .

[4]  S. McFarling Combining Branch Predictors , 1993 .

[5]  T. N. Vijaykumar,et al.  Register Communication Strategies for the Multiscalar Architecture , 1996 .

[6]  Scott A. Mahlke,et al.  A comparison of full and partial predicated execution support for ILP processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[7]  Dirk Grunwald,et al.  Selective eager execution on the PolyPath architecture , 1998, ISCA.

[8]  Mikko H. Lipasti Value locality and speculative execution , 1998 .

[9]  Alexandre E. Eichenberger,et al.  Stage scheduling: a technique to reduce the register requirements of a module schedule , 1995, MICRO 1995.

[10]  Dionisios N. Pnevmatikatos,et al.  Guarded execution and branch prediction in dynamic ILP processors , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[11]  Gurindar S. Sohi,et al.  ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[12]  Margaret Martonosi,et al.  Multipath execution: opportunities and limits , 1998, ICS '98.

[13]  S. Vajapeyam,et al.  Improving Superscalar Instruction Dispatch And Issue By Exploiting Dynamic Code Sequences , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[14]  Manoj Franklin,et al.  Multiscalar execution along a single flow of control , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[15]  Kunle Olukotun,et al.  Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor , 1997 .

[16]  Mark N. Wegman,et al.  An efficient method of computing static single assignment form , 1989, POPL '89.

[17]  Mikko H. Lipasti,et al.  Superspeculative Microarchitecture for Beyond AD 2000 , 1997, Computer.

[18]  Eric Rotenberg,et al.  Assigning confidence to conditional branch predictions , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[19]  Ken Kennedy,et al.  Conversion of control dependence to data dependence , 1983, POPL '83.

[20]  Manoj Franklin,et al.  The multiscalar architecture , 1993 .

[21]  D.R. Kaeli,et al.  Branch history table prediction of moving target branches due to subroutine returns , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[22]  Michael Rodeh,et al.  Global instruction scheduling for superscalar machines , 1991, PLDI '91.

[23]  Yale N. Patt,et al.  One Billion Transistors, One Uniprocessor, One Chip , 1997, Computer.

[24]  James E. Smith,et al.  The performance potential of data dependence speculation and collapsing , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[25]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[26]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[27]  Hideki Ando,et al.  Unconstrained speculative execution with predicated state buffering , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[28]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[29]  Gary S. Tyson,et al.  Limited Dual Path Execution , 2000 .

[30]  Jenn-Yuan Tsai,et al.  The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[31]  Gurindar S. Sohi,et al.  The Expandable Split Window Paradigm for Exploiting Fine-grain Parallelism , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[32]  Doug Burger,et al.  Evaluating Future Microprocessors: the SimpleScalar Tool Set , 1996 .

[33]  F. Gabbay Speculative Execution based on Value Prediction Research Proposal towards the Degree of Doctor of Sciences , 1996 .

[34]  Todd C. Mowry,et al.  The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[35]  Kevin O'Brien,et al.  Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading , 1995, PACT.

[36]  Mike Schlansker,et al.  Parallelization of loops with exits on pipelined architectures , 1990, Proceedings SUPERCOMPUTING '90.

[37]  Augustus K. Uht,et al.  Disjoint eager execution: an optimal form of speculative execution , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[38]  Gurindar S. Sohi,et al.  Compiling for the multiscalar architecture , 1998 .