Extending the Cell SPE with Energy Efficient Branch Prediction

Energy-efficient dynamic branch predictors are proposed for the Cell SPE, which normally depends on compiler-inserted hint instructions to predict branches. All designed schemes use a Branch Target Buffer (BTB) to store the branch target address and the prediction, which is computed using a bimodal counter. One prediction scheme predecodes instructions when they are fetched from the local store and accesses the BTB only for branch instructions, thereby saving power compared to conventional dynamic predictors that access the BTB for every instruction. In addition, several ways to leverage the existing hint instructions for the dynamic branch predictor are studied. We also introduce branch warning instructions which initiate branch prediction before the actual branch instruction is fetched. They allow fetching the instructions starting at the branch target and thus completely remove the branch penalty for correctly predicted branches. For a 256-entry BTB, a speedup of up to 18.8% is achieved. The power consumption of the branch prediction schemes is estimated at 1% or less of the total power dissipation of the SPE and the average energy-delay product is reduced by up to 6.2%.

[1]  Kevin Skadron,et al.  Power issues related to branch prediction , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[2]  Martin Hopkins,et al.  Synergistic Processing in Cell's Multicore Architecture , 2006, IEEE Micro.

[3]  Sang H. Dhong,et al.  Microarchitecture and implementation of the synergistic processor in 65-nm and 90-nm SOI , 2007, IBM J. Res. Dev..

[4]  Vittorio Zaccaria,et al.  Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach , 2005, Integr..

[5]  Alex Orailoglu,et al.  Power efficient branch prediction through early identification of branch addresses , 2006, CASES '06.

[6]  M.C. Huang,et al.  Branch prediction on demand: an energy-efficient solution [microprocessor architecture] , 2003, Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003. ISLPED '03..

[7]  Xavier Martorell Bofill,et al.  CellSim: a validated modular heterogeneous multiprocessor simulator , 2007 .

[8]  Chunyang Gou,et al.  Sams: single-affiliation multiple-stride parallel memory scheme , 2008, MAW '08.

[9]  David A. Bader,et al.  On the Design and Analysis of Irregular Algorithms on the Cell Processor: A Case Study of List Ranking , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[10]  David A. Bader,et al.  High performance combinatorial algorithm design on the Cell Broadband Engine processor , 2007, Parallel Comput..

[11]  H. Peter Hofstee,et al.  Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.

[12]  Ben H. H. Juurlink,et al.  Analysis of video filtering on the cell processor , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[13]  Ben H. H. Juurlink,et al.  Specialization of the Cell SPE for Media Applications , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[14]  Michael C. Huang,et al.  Branch prediction on demand: an energy-efficient solution , 2003, ISLPED '03.

[15]  Shlomo Weiss,et al.  Thrifty BTB: A comprehensive solution for dynamic power reduction in branch target buffers , 2008, Microprocess. Microsystems.

[16]  Sang H. Dhong,et al.  Implementation of the 65nm Cell Broadband Engine , 2007, 2007 IEEE Custom Integrated Circuits Conference.