On reducing energy-consumption by late-inserting instructions into the issue queue

In the presence of a long-latency instruction as a L2 miss, the issue queue (IQ) may fill with instructions dependent on the L2 miss; consequently, the IQ will not expose instruction-level parallelism until resolving the miss. In the scope of memory-latency tolerant processors, we propose delaying the insertion into the IQ of the instructions dependent on load instructions predicted to miss L2. These instructions will be stored in an instruction buffer instead of being inserted in the IQ. After resolving the L2 miss, the dependent instructions will be inserted into the IQ. Results show that the proposal reduces the total number of replays from 37% (integer benchs) to 61% (floating-point benchs), the average performance degradation is, at most, 2%, and the average overall-chip energy-consumption reduction is around 8% in FP benchs.

[1]  Josep Llosa,et al.  Out-of-order commit processors , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[2]  Glenn Reinman,et al.  Scaling the issue window with look-ahead latency prediction , 2004, ICS '04.

[3]  Enric Morancho,et al.  Recovery mechanism for latency misprediction , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[4]  Onur Mutlu,et al.  Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[5]  Haitham Akkary,et al.  Continual flow pipelines , 2004, ASPLOS XI.

[6]  E. Morancho,et al.  Predicting L 2 Misses to Increase Issue-Queue Efficacy , 2006 .

[7]  Xiaodong Zhang,et al.  Look-Ahead Architecture Adaptation to Reduce Processor Power Consumption , 2005, IEEE Micro.

[8]  Tong Li,et al.  A large, fast instruction window for tolerating cache misses , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.