Reducing Power and Energy Overhead in Instruction Prefetching for Embedded Processor Systems

Instruction prefetching is an effective way to improve performance of the pipelined processors. However, existing instruction prefetching schemes increase performance with a significant energy sacrifice, making them unsuitable for embedded and ubiquitous systems where high performance and low energy consumption are all demanded. This paper proposes reducing energy overhead in instruction prefetching by using a simple hardware/software design and an efficient prefetching operation scheme. Two approaches are investigated: Decoded Loop Instruction Cache based Prefetching DLICP that is most effective for loop intensive applications, and the enhanced DLICP with the popular existing Next Line Prefetching NLP for applications of a moderate number of loops. The experimental results show that both DLICP and the enhanced DLICP deliver improved performance at a much reduced energy overhead.

[1]  Stephan Weibelzahl Problems and Pitfalls in the Evaluation of Adaptive Systems , 2005 .

[2]  Gary S. Tyson,et al.  Branch history guided instruction prefetching , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.

[3]  Mo Adam Mahmood,et al.  A New Approach to Evaluating Business Ethics : An Artificial Neural Networks Application , 2018 .

[4]  Mo Adam Mahmood,et al.  Contemporary Issues in End User Computing , 2006 .

[5]  Keith S. Horton,et al.  Learning from Patterns During Information Technology Configuration , 2005, J. Organ. End User Comput..

[6]  James E. Smith,et al.  Prefetching in supercomputer instruction caches , 1992, Proceedings Supercomputing '92.

[7]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[8]  Ranida B. Harris,et al.  Support and Facilitating Conditions to Computer Workers Who Dislike Working with Computers , 2011 .

[9]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[10]  Yongmei Bentley,et al.  Evaluation of Information Strategy Implementation: A Critical Approach , 2011, J. Organ. End User Comput..

[11]  Yi Zhang,et al.  Execution History Guided Instruction Prefetching , 2002, ICS '02.

[12]  Frank Vahid,et al.  A Study on the Loop Behavior of Embedded Programs , 2002 .

[13]  Michel Dubois,et al.  Sequential Hardware Prefetching in Shared-Memory Multiprocessors , 1995, IEEE Trans. Parallel Distributed Syst..

[14]  Todd C. Mowry,et al.  Cooperative prefetching: compiler and hardware support for effective instruction prefetching in modern processors , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[15]  Francisco J. Cazorla,et al.  Kilo-instruction processors: overcoming the memory wall , 2005, IEEE Micro.

[16]  Yoshinori Takeuchi,et al.  PEAS-III: an ASIP design environment , 2000, Proceedings 2000 International Conference on Computer Design.

[17]  Steve Clarke,et al.  End-user Computing: Concepts, Methodologies, Tools and Applications , 2008 .

[18]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[19]  Glenn Reinman,et al.  Fetch directed instruction prefetching , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[20]  Marija Mikic-Rakic,et al.  Software Architectural Support for Handheld Computing , 2003, Computer.

[21]  Trevor N. Mudge,et al.  Wrong-path instruction prefetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[22]  Alexander V. Veidenbaum,et al.  Stride-directed prefetching for secondary caches , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[23]  Margaret Martonosi,et al.  TCP: tag correlating prefetchers , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[24]  Raminder Singh Bajwa,et al.  Instruction buffering to reduce power in processors for signal processing , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[25]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[26]  Mark J. Charney,et al.  Prefetching and memory system behavior of the SPEC95 benchmark suite , 1997, IBM J. Res. Dev..

[27]  Alan Jay Smith,et al.  Sequential Program Prefetching in Memory Hierarchies , 1978, Computer.

[28]  Dirk Grunwald,et al.  Prefetching Using Markov Predictors , 1999, IEEE Trans. Computers.

[29]  Janak H. Patel,et al.  Stride directed prefetching in scalar processors , 1992, MICRO 1992.

[30]  Alexander V. Veidenbaum,et al.  Compiler-directed data prefetching in multiprocessors with memory hierarchies , 1990 .

[31]  Hojung Cha,et al.  Reducing display power in DVS-enabled handheld systems , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).

[32]  Janis L. Gogan Steering Through the Mist of Personal Computing: A Guide for Managers , 1991 .

[33]  Hoi-Jun Yoo,et al.  Cost-effective low-power graphics processing unit for handheld devices , 2008, IEEE Communications Magazine.

[34]  Mahmoud Naghshineh,et al.  WiSAP: a wireless personal access network for handheld computing devices , 1998, IEEE Wirel. Commun..

[35]  Tatjana Takševa Social Software and the Evolution of User Expertise: Future Trends in Knowledge Creation and Dissemination , 2012 .