Profiling driven computation reuse: an embedded software synthesis technique for energy and performance optimization

It has been observed that even highly optimized software programs perform "redundant" computations during their execution, due to the nature (statistics) of the values assumed by input or internal program variables. For embedded software running on battery-powered systems, such computations can be viewed as unnecessary energy overheads, and hence represent opportunities for improvement in energy efficiency. We present a systematic methodology to identify and eliminate redundancies in the computations performed by embedded software programs, by exploiting opportunities that dynamically arise for computation reuse. We report the results of experiments on two different embedded systems-a detailed simulation model of Fujitsu SPARClite based embedded system, and actual current measurements on al Compaq iPAQ PDA. Our results demonstrate that the proposed technique can reduce energy by up to 46.9% (average of 21.2% and 13.9% for the SPARClite based system and the iPAQ, respectively) while simultaneously improving performance by up to 45.8% (average of 20.7% and 16.8% for the SPARClite based system and the iPAQ, respectively), compared to well-optimized programs that do not employ such a technique.

[1]  R. Gupta,et al.  Value prediction in VLIW machines , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[2]  Niraj K. Jha,et al.  Input space adaptive embedded software synthesis , 2002, Proceedings of ASP-DAC/VLSI Design 2002. 7th Asia and South Pacific Design Automation Conference and 15h International Conference on VLSI Design.

[3]  Ahmed Amer,et al.  Who Is More Adaptive ? ACME : Adaptive Caching using Multiple Experts , 2002 .

[4]  G.S. Sohi,et al.  Dynamic instruction reuse , 1997, ISCA '97.

[5]  S. Richardson Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation , 1992 .

[6]  Brad Calder,et al.  Value Profiling and Optimization , 1999, J. Instr. Level Parallelism.

[7]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[8]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[9]  Wen-mei W. Hwu,et al.  Compiler-directed dynamic computation reuse: rationale and initial results , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[10]  Ganesh Lakshminarayana,et al.  Algorithm Exploration for Efficient Public-Key Security Processing on Wireless Handsets , 2002 .

[11]  G. Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[12]  Niraj K. Jha,et al.  Analysis of power dissipation in embedded systems using real-time operating systems , 2003, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[13]  Jian Huang,et al.  Extending Value Reuse to Basic Blocks with Compiler Support , 2000, IEEE Trans. Computers.