Instruction packing: reducing power and delay of the dynamic scheduling logic

The instruction scheduling logic used in modern superscalar microprocessors often relies on associative searching of the issue queue entries to dynamically wakeup instructions for the execution. Traditional designs use one issue queue entry for each instruction, regardless of the actual number of operands actively used in the wakeup process. In this paper we propose instruction packing - a novel microarchitectural technique that reduces both the delay and the power consumption of the issue queue by sharing the associative part of an issue queue entry between two instructions, each with at most one nonready register source operand at the time of dispatch. Our results show that instruction packing provides a 39% reduction of the whole issue queue power and 21.6% reduction in the wakeup delay with as little as 0.4% IPC degradation on the average across the simulated SPEC benchmarks.

[1]  Ramon Canal,et al.  Reducing the complexity of the issue logic , 2001, ICS '01.

[2]  Gürhan Küçük,et al.  Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources , 2001, MICRO.

[3]  Haitham Akkary,et al.  Continual flow pipelines , 2004, ASPLOS XI.

[4]  Gürhan Küçük,et al.  Energy-efficient issue queue design , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[5]  Yale N. Patt,et al.  On pipelining dynamic instruction scheduling logic , 2000, MICRO 33.

[6]  Tong Li,et al.  A large, fast instruction window for tolerating cache misses , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[7]  Peter M. Kogge,et al.  Inherently Lower-Power High-Performance , 2001 .

[8]  Todd M. Austin,et al.  Efficient dynamic scheduling through tag elimination , 2002, ISCA.

[9]  Steven K. Reinhardt,et al.  A scalable instruction queue design using dependence chains , 2002, ISCA.

[10]  Larry L. Biro,et al.  Power considerations in the design of the Alpha 21264 microprocessor , 1998, Proceedings 1998 Design and Automation Conference. 35th DAC. (Cat. No.98CH36175).

[11]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, ISCA.

[12]  Michael C. Huang,et al.  Energy-efficient hybrid wakeup logic , 2002, ISLPED '02.

[13]  Sanjay J. Patel,et al.  Reducing the Scheduling Critical Cycle Using Wakeup Prediction , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[14]  David M. Brooks,et al.  A circuit level implementation of an adaptive issue queue for power-aware microprocessors , 2001, GLSVLSI '01.

[15]  Chris Wilkerson,et al.  Hierarchical scheduling windows , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[16]  Oguz Ergin,et al.  Defining wakeup width for efficient dynamic scheduling , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[17]  T. Austin,et al.  Cyclone: a broadcast-free dynamic instruction scheduler with selective replay , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[18]  Mikko H. Lipasti,et al.  Macro-op Scheduling: Relaxing Scheduling Loop Constraints , 2003, MICRO.

[19]  Amir Roth,et al.  Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[20]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[21]  Yale N. Patt,et al.  Select-free instruction scheduling logic , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[22]  Alvin R. Lebeck,et al.  Fast instruction window for tolerating cache misses , 2002, ISCA 2002.

[23]  Joseph J. Sharkey,et al.  Non-uniform Instruction Scheduling , 2005, Euro-Par.

[24]  Chris Wilkerson,et al.  Hierarchical Scheduling Windows , 2002, MICRO.

[25]  Zeshan Chishti,et al.  Wire delay is not a problem for SMT (in the near future) , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[26]  Glenn Reinman,et al.  Scaling the issue window with look-ahead latency prediction , 2004, ICS '04.

[27]  S. Peter Song,et al.  The PowerPC 604 RISC microprocessor. , 1994, IEEE Micro.

[28]  Victor V. Zyuban,et al.  Inherently Lower-Power High-Performance Superscalar Architectures , 2001, IEEE Trans. Computers.

[29]  Tejas Karkhanis,et al.  Energy efficient co-adaptive instruction fetch and issue , 2003, ISCA '03.

[30]  Gürhan Küçük,et al.  Energy efficient comparators for superscalar datapaths , 2004, IEEE Transactions on Computers.

[31]  Josep Llosa,et al.  Out-of-order commit processors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[32]  BurgerDoug,et al.  The SimpleScalar tool set, version 2.0 , 1997 .

[33]  Mikko H. Lipasti,et al.  Half-price architecture , 2003, ISCA '03.

[34]  Narayanan Vijaykrishnan,et al.  Exploring Wakeup-Free Instruction Scheduling , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[35]  Jaume Abella,et al.  Low-complexity distributed issue queue , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[36]  Ramon Canal,et al.  A low-complexity issue logic , 2000, ICS '00.

[37]  Pierre Michaud,et al.  Data-flow prescheduling for large instruction windows in out-of-order processors , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[38]  Kanad Ghose,et al.  Reducing power requirements of instruction scheduling through dynamic allocation of multiple datapath resources , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[39]  Antonio González,et al.  Energy-effective issue logic , 2001, ISCA 2001.

[40]  Joseph J. Sharkey,et al.  Instruction Recirculation: Eliminating Counting Logic in Wakeup-Free Schedulers , 2005, Euro-Par.