Reducing the Number of Bits in the BTB to Attack the Branch Predictor Hot-Spot

Current superscalar processors access the BTB early to anticipate the branch/jump target address. This access is frequent and aggressively performed since the BTB is accessed every cycle for all instructions in the ICache line being fetched. This fact increases the power density, which could create hot spots, thus increasing packaging and cooling costs. Power consumption in the BTB comes mostly from its two main fields: the tag and the target address fields. Reducing the length of either of these fields reduces power consumption, silicon area and access time. This paper analyzes at what extent tag and target address lengths could be reduced to benefit both dynamic and static power consumption, silicon area, and access time, while sustaining performance. Experimental results show that the tag length and the target address could be reduced by about a half and one byte, respectively with no performance losses. BTB peak power savings can reach about 35% when both reductions are combined together, thus effectively attacking the hot-spot.

[1]  Shekhar Y. Borkar,et al.  Design challenges of technology scaling , 1999, IEEE Micro.

[2]  M.C. Huang,et al.  Branch prediction on demand: an energy-efficient solution [microprocessor architecture] , 2003, Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03..

[3]  Vittorio Zaccaria,et al.  Branch prediction techniques for low-power VLIW processors , 2003, GLSVLSI '03.

[4]  Julio Sahuquillo,et al.  Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors , 2007 .

[5]  Michael C. Huang,et al.  Branch prediction on demand: an energy-efficient solution , 2003, ISLPED '03.

[6]  Trevor N. Mudge,et al.  The bi-mode branch predictor , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[7]  Kevin Skadron,et al.  Control-theoretic techniques and thermal-RC modeling for accurate and localized dynamic thermal management , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[8]  Daniel A. Jiménez,et al.  Dynamic branch prediction with perceptrons , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[9]  Trevor N. Mudge,et al.  The YAGS branch prediction scheme , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[10]  E. Cohen,et al.  Hotspot-Limited Microprocessors: Direct Temperature and Power Distribution Measurements , 2007, IEEE Journal of Solid-State Circuits.

[11]  P. Petrov,et al.  Low-power branch target buffer for application-specific embedded processors , 2005 .

[12]  Margaret Martonosi,et al.  Applying decay strategies to branch predictors for leakage energy savings , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[13]  Amirali Baniasadi,et al.  Branchless cycle prediction for embedded processors , 2006, SAC '06.

[14]  Mircea R. Stan,et al.  Proceedings of the 13th ACM Great Lakes Symposium on VLSI 2003, Washington, DC, USA, April 28-29, 2003 , 2003, ACM Great Lakes Symposium on VLSI.

[15]  Tse-Yu Yeh,et al.  Understanding branches and designing branch predictors for high-performance microprocessors , 2001, Proc. IEEE.

[16]  Yale N. Patt,et al.  The agree predictor: a mechanism for reducing negative branch history interference , 1997, ISCA '97.