Power-efficient wakeup tag broadcast

The dynamic instruction scheduling logic is one of the most critical components of modern superscalar microprocessors, both from the delay and power dissipation standpoints. The delay and energy requirement of driving the wakeup tags across the associatively-addressed issue queue accounts for a significant percentage of the scheduler's overhead and also limits the design scalability. We propose tag memoization and tagline folding - two schemes to reduce the power of wakeup tag broadcasts by reducing the number of tag-bits that are driven in each broadcast. Our results show that the combination of these mechanisms provides 223% average reduction of the wakeup tag broadcast power with no impact on the IPC.

[1]  Gürhan Küçük,et al.  Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources , 2001, MICRO.

[2]  Tong Li,et al.  A large, fast instruction window for tolerating cache misses , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[3]  Haitham Akkary,et al.  Continual flow pipelines , 2004, ASPLOS XI.

[4]  Narayanan Vijaykrishnan,et al.  Exploring Wakeup-Free Instruction Scheduling , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[5]  Todd M. Austin,et al.  Cyclone: a broadcast-free dynamic instruction scheduler with selective replay , 2003, ISCA '03.

[6]  Sanjay J. Patel,et al.  Reducing the Scheduling Critical Cycle Using Wakeup Prediction , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[7]  Jaume Abella,et al.  Low-complexity distributed issue queue , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[8]  Michael C. Huang,et al.  Energy-efficient hybrid wakeup logic , 2002, ISLPED '02.

[9]  Peter M. Kogge,et al.  Inherently Lower-Power High-Performance , 2001 .

[10]  Todd M. Austin,et al.  Efficient dynamic scheduling through tag elimination , 2002, ISCA.

[11]  Steven K. Reinhardt,et al.  A scalable instruction queue design using dependence chains , 2002, ISCA.

[12]  Luca Benini,et al.  Asymptotic zero-transition activity encoding for address busses in low-power microprocessor-based systems , 1997, Proceedings Great Lakes Symposium on VLSI.

[13]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[14]  Chi-Ying Tsui,et al.  Saving power in the control path of embedded processors , 1994, IEEE Design & Test of Computers.

[15]  Ramon Canal,et al.  Reducing the complexity of the issue logic , 2001, ICS '01.

[16]  Larry L. Biro,et al.  Power considerations in the design of the Alpha 21264 microprocessor , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[17]  Chris Wilkerson,et al.  Hierarchical Scheduling Windows , 2002, MICRO.

[18]  S. Peter Song,et al.  The PowerPC 604 RISC microprocessor. , 1994, IEEE Micro.

[19]  Gürhan Küçük,et al.  Energy efficient comparators for superscalar datapaths , 2004, IEEE Transactions on Computers.

[20]  Josep Llosa,et al.  Out-of-order commit processors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[21]  Antonio González,et al.  Energy-effective issue logic , 2001, ISCA 2001.

[22]  Ramon Canal,et al.  A low-complexity issue logic , 2000, ICS '00.

[23]  Pierre Michaud,et al.  Data-flow prescheduling for large instruction windows in out-of-order processors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[24]  Mikko H. Lipasti,et al.  Macro-op Scheduling: Relaxing Scheduling Loop Constraints , 2003, MICRO.

[25]  Yale N. Patt,et al.  Select-free instruction scheduling logic , 2001, MICRO.

[26]  Gürhan Küçük,et al.  Energy-efficient issue queue design , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[27]  Massoud Pedram,et al.  A Class of Irredundant Encoding Techniques for Reducing Bus Power , 2002, J. Circuits Syst. Comput..

[28]  Krste Asanovic,et al.  Dynamic zero compression for cache energy reduction , 2000, MICRO 33.

[29]  Yale N. Patt,et al.  On pipelining dynamic instruction scheduling logic , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[30]  David M. Brooks,et al.  A circuit level implementation of an adaptive issue queue for power-aware microprocessors , 2001, GLSVLSI '01.

[31]  Oguz Ergin,et al.  Defining wakeup width for efficient dynamic scheduling , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[32]  Alvin R. Lebeck,et al.  Fast instruction window for tolerating cache misses , 2002, ISCA 2002.

[33]  P.P. Gelsinger,et al.  Microprocessors circa 2000 , 1989, IEEE Spectrum.

[34]  Victor V. Zyuban,et al.  Inherently Lower-Power High-Performance Superscalar Architectures , 2001, IEEE Trans. Computers.

[35]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[36]  Tejas Karkhanis,et al.  Energy efficient co-adaptive instruction fetch and issue , 2003, ISCA '03.

[37]  Zeshan Chishti,et al.  Wire delay is not a problem for SMT (in the near future) , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[38]  Glenn Reinman,et al.  Scaling the issue window with look-ahead latency prediction , 2004, ICS '04.

[39]  Mircea R. Stan,et al.  Bus-invert coding for low-power I/O , 1995, IEEE Trans. Very Large Scale Integr. Syst..

[40]  Luca Benini,et al.  Address bus encoding techniques for system-level power optimization , 1998, Proceedings Design, Automation and Test in Europe.

[41]  James E. Smith,et al.  Very low power pipelines using significance compression , 2000, MICRO 33.

[42]  Mikko H. Lipasti,et al.  Half-price architecture , 2003, ISCA '03.

[43]  Eric Rotenberg,et al.  A large, fast instruction window for tolerating cache misses , 2002, ISCA.