Low power engineering

Resource usage in embedded system platforms depends on application workload characteristics, desired quality of service and environmental conditions. In general, system workload is highly non-stationary due to the heterogeneous nature of information content. Quality of service depends on user requirements, which may change over time. In addition, both can be affected by environmental conditions such as network congestion and wireless link quality.

[1]  Rajesh K. Gupta,et al.  Power savings in embedded processors through decode filter cache , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[2]  Ibrahim N. Hajj,et al.  Architectural and compiler support for energy reduction in the memory hierarchy of high performance microprocessors , 1998, Proceedings. 1998 International Symposium on Low Power Electronics and Design (IEEE Cat. No.98TH8379).

[3]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[4]  Emilio L. Zapata,et al.  Set Associative Cache Behavior Optimization , 1999, Euro-Par.

[5]  François Bodin,et al.  Accurate Data Distribution into Blocks may Boost Cache Performance , 1997 .

[6]  Tajana Simunic,et al.  Remote power control of wireless network interfaces , 2003, J. Embed. Comput..

[7]  Yves Robert,et al.  Loop nest scheduling and transformations , 1993 .

[8]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[9]  Keshav Pingali,et al.  A Singular Loop Transformation Framework Based on Non-Singular Matrices , 1992, LCPC.

[10]  Duncan H. Lawrie,et al.  On the Performance Enhancement of Paging Systems Through Program Analysis and Transformations , 1981, IEEE Transactions on Computers.

[11]  H. De Man,et al.  SynGuide: An environment for doing interactive correctness preserving transformations , 1993, Proceedings of IEEE Workshop on VLSI Signal Processing.

[12]  Nikil D. Dutt,et al.  System and architecture-level power reduction of microprocessor-based communication and multi-media applications , 2000, IEEE/ACM International Conference on Computer Aided Design. ICCAD - 2000. IEEE/ACM Digest of Technical Papers (Cat. No.00CH37140).

[13]  Rudy Lauwereins,et al.  Instruction buffering exploration for low energy VLIWs with instruction clusters , 2004 .

[14]  Kazuaki Murakami,et al.  A history-based I-cache for low-energy multimedia applications , 2002, ISLPED '02.

[15]  E.H.L. Aarts,et al.  Period assignment in multidimensional periodic scheduling , 1998, 1998 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (IEEE Cat. No.98CB36287).

[16]  Frank Vahid,et al.  Dynamic loop caching meets preloaded loop caching-a hybrid approach , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[17]  Anantha Chandrakasan,et al.  Algorithmic transforms for efficient energy scalable computation , 2000, ISLPED'00: Proceedings of the 2000 International Symposium on Low Power Electronics and Design (Cat. No.00TH8514).

[18]  Luca Benini,et al.  A survey of design techniques for system-level dynamic power management , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[19]  Albert van der Werf,et al.  Mapping array communication onto FIFO communication - towards an implementation , 2000, ISSS '00.

[20]  Aviral Shrivastava,et al.  An efficient compiler technique for code size reduction using reduced bit-width ISAs , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[21]  Johan A. Pouwelse,et al.  Energy priority scheduling for variable voltage processors , 2001, ISLPED '01.

[22]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[23]  Jef L. van Meerbergen,et al.  Memory arbitration and cache management in stream-based systems , 2000, DATE '00.

[24]  Frank Vahid,et al.  Synthesis of customized loop caches for core-based embedded systems , 2002, ICCAD 2002.

[25]  Rajendra Yavatkar,et al.  A CPU Scheduling Algorithm for Continuous Media Applications , 1995, NOSSDAV.

[26]  Preeti Ranjan Panda,et al.  Memory bank customization and assignment in behavioral synthesis , 1999, 1999 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (Cat. No.99CH37051).

[27]  Frank Vahid,et al.  Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example , 2002, IEEE Computer Architecture Letters.

[28]  Nikil D. Dutt,et al.  Memory aware compilation through accurate timing extraction , 2000, Proceedings 37th Design Automation Conference.

[29]  Fredrik Dahlgren,et al.  Exploration of the spatial locality on emerging applications and the consequences for cache performance , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[30]  Jörg Henkel,et al.  I-CoPES: fast instruction code placement for embedded systems to improve performance and energy efficiency , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[31]  Paul Feautrier Compiling for massively parallel architectures: a perspective , 1995, Microprocess. Microprogramming.

[32]  Keshav Pingali,et al.  Access normalization: loop restructuring for NUMA computers , 1993, TOCS.

[33]  Tibor Gyimóthy,et al.  Survey of code-size reduction methods , 2003, CSUR.

[34]  Praveen K. Murthy,et al.  A buffer merging technique for reducing memory requirements of synchronous dataflow specifications , 1999, Proceedings 12th International Symposium on System Synthesis.

[35]  Edwin Hsing-Mean Sha,et al.  Multi-dimensional interleaving for time-and-memory design optimization , 1995, Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors.

[36]  Kathryn S. McKinley,et al.  A Compiler Optimization Algorithm for Shared-Memory Multiprocessors , 1998, IEEE Trans. Parallel Distributed Syst..

[37]  Donald E. Thomas,et al.  The system architect's workbench , 1988, DAC '88.

[38]  Rajesh K. Gupta,et al.  Design of a predictive filter cache for energy savings in high performance processor architectures , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[39]  Chi-Ying Tsui,et al.  Low power motion estimation design using adaptive pixel truncation , 1997, ISLPED '97.

[40]  Luca Benini,et al.  Software-controlled processor speed setting for low-power streamingmultimedia , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[41]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[42]  Weiyu Tang,et al.  Reducing power with an L0 instruction cache using history-based prediction , 2002, International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems.

[43]  Ken Kennedy,et al.  Vector Register Allocation , 1992, IEEE Trans. Computers.

[44]  William Jalby,et al.  A strategy for array management in local memory , 1994, Math. Program..

[45]  Jörg Henkel,et al.  Code compression for low power embedded system design , 2000, Proceedings 37th Design Automation Conference.

[46]  Rudolf Eigenmann,et al.  Automatic program parallelization , 1993, Proc. IEEE.

[47]  G. Albera,et al.  Power/performance advantages of victim buffer in high-performance processors , 1999, Proceedings IEEE Alessandro Volta Memorial Workshop on Low-Power Design.

[48]  Luca Benini,et al.  Contents provider-assisted dynamic voltage scaling for low energy multimedia applications , 2002, ISLPED '02.

[49]  Mani B. Srivastava,et al.  Power-aware multimedia systems using run-time prediction , 2001, VLSI Design 2001. Fourteenth International Conference on VLSI Design.

[50]  Keshab K. Parhi,et al.  Algorithm transformation techniques for concurrent processors , 1989, Proc. IEEE.

[51]  Hock-Beng Lim,et al.  Efficient integration of compiler-directed cache coherence and data prefetching , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[52]  Diederik Verkest,et al.  Global multimedia system design exploration using accurate memory organization feedback , 1999, DAC '99.

[53]  Patrice Quinton,et al.  The Alpha du Centaur experiment , 1992 .

[54]  Flavius Gruian,et al.  Energy-Centric Scheduling for Real-Time Systems , 2002 .

[55]  P. Feautrier Compiling for Massively Parallel Architectures , 1995 .

[56]  Hugo De Man,et al.  A preprocessing step for global loop transformations for data transfer optimization , 2000, CASES '00.

[57]  Wei-Chung Cheng,et al.  Power-Aware Bus Encoding Techniques for I/O and Data Buses in an Embedded System , 2002, J. Circuits Syst. Comput..

[58]  Hugo De Man,et al.  Flow graph balancing for minimizing the required memory bandwidth , 1996, Proceedings of 9th International Symposium on Systems Synthesis.

[59]  Chaitali Chakrabarti,et al.  Memory exploration for low power, embedded systems , 1999, DAC '99.

[60]  Luca Benini,et al.  Cached-code compression for energy minimization in embedded processors , 2001, ISLPED '01.

[61]  Keshav Pingali,et al.  Access normalization: loop restructuring for NUMA compilers , 1992, ASPLOS V.

[62]  Preeti Ranjan Panda,et al.  Memory optimizations and exploration for embedded systems , 1998 .

[63]  Mahmut T. Kandemir,et al.  Reducing memory requirements of nested loops for embedded systems , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[64]  Song Chen,et al.  Synthesis of custom interleaved memory systems , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[65]  Anantha Chandrakasan,et al.  A framework for energy-scalable communication in high-density wireless networks , 2002, ISLPED '02.

[66]  Rudy Lauwereins,et al.  Instruction buffering exploration for low energy embedded processors , 2005, J. Embed. Comput..

[67]  Nikil D. Dutt,et al.  MIST: an algorithm for memory miss traffic management , 2000, IEEE/ACM International Conference on Computer Aided Design. ICCAD - 2000. IEEE/ACM Digest of Technical Papers (Cat. No.00CH37140).

[68]  Luca Benini,et al.  Dynamic voltage scaling and power management for portable systems , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[69]  Mahmut T. Kandemir,et al.  A Holistic Approach to System Level Energy Optimization , 2000, PATMOS.

[70]  Monica S. Lam,et al.  A data locality optimizing algorithm , 1991, PLDI '91.

[71]  Henk Corporaal,et al.  A Low Energy Clustered Instruction Memory Hierarchy for Long Instruction Word Processors , 2002, PATMOS.

[72]  Anantha Chandrakasan,et al.  Energy scalable system design , 2002, IEEE Trans. Very Large Scale Integr. Syst..

[73]  Guido Araujo,et al.  Compressed code execution on DSP architectures , 1999, Proceedings 12th International Symposium on System Synthesis.

[74]  Raminder Singh Bajwa,et al.  Instruction buffering to reduce power in processors for signal processing , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[75]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[76]  Nikil D. Dutt,et al.  Data cache sizing for embedded processor applications , 1998, Proceedings Design, Automation and Test in Europe.

[77]  Ken Kennedy,et al.  The memory of bandwidth bottleneck and its amelioration by a compiler , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[78]  Kaushik Roy,et al.  Reducing set-associative cache energy via way-prediction and selective direct-mapping , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[79]  Narayanan Vijaykrishnan,et al.  Instruction scheduling based on energy and performance constraints , 2000, Proceedings IEEE Computer Society Workshop on VLSI 2000. System Design for a System-on-Chip Era.

[80]  Anne Mignotte,et al.  Loop alignment for memory accesses optimization , 1999, Proceedings 12th International Symposium on System Synthesis.

[81]  Francky Catthoor Energy-Delay Efficient Data Storage and Transfer Architectures and Methodologies: Current Solutions and Remaining Problems , 1999, J. VLSI Signal Process..

[82]  Tajana Simunic,et al.  A low-power, fixed-point, front-end feature extraction for a distributed speech recognition system , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[83]  Anantha P. Chandrakasan,et al.  Low-power CMOS digital design , 1992 .

[84]  G. Venkatesh,et al.  Extensions to programmable DSP architectures for reduced power dissipation , 1998, Proceedings Eleventh International Conference on VLSI Design.

[85]  Laszlo A. Belady,et al.  A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..

[86]  Harry Berryman,et al.  Multiprocessors and run-time compilation , 1991, Concurr. Pract. Exp..

[87]  Chau-Wen Tseng,et al.  An Overview of the SUIF Compiler for Scalable Parallel Machines , 1995, PPSC.

[88]  Hiroto Yasuura,et al.  A power reduction technique with object code merging for application specific embedded processors , 2000, DATE '00.

[89]  Luca Benini,et al.  Selective instruction compression for memory energy reduction in embedded systems , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[90]  Wen-mei W. Hwu,et al.  Enhancing loop buffering of media and telecommunications applications using low-overhead predication , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[91]  Enric Morancho,et al.  A Unified Transformation Technique for Multilevel Blocking , 1996, Euro-Par, Vol. I.

[92]  Dongkun Shin,et al.  An Operation Rearrangement Technique for Low-Power VLIW Instruction Fetch , 2000 .

[93]  Nikil D. Dutt,et al.  Local memory exploration and optimization in embedded systems , 1999, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[94]  Henk L. Muller,et al.  Predictable instruction caching for media processors , 2002, Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors.

[95]  M. Liou,et al.  Reducing hardware complexity of motion estimation algorithms using truncated pixels , 1997, Proceedings of 1997 IEEE International Symposium on Circuits and Systems. Circuits and Systems in the Information Age ISCAS '97.

[96]  John A. Chandy,et al.  The Paradigm Compiler for Distributed-Memory Multicomputers , 1995, Computer.

[97]  Mi Lu,et al.  An Iteration Partition Approach for Cache or Local Memory Thrashing on Parallel Processing , 1991, IEEE Trans. Computers.

[98]  Naehyuck Chang,et al.  Low-power color TFT LCD display for hand-held embedded systems , 2002, ISLPED '02.

[99]  Mahmut T. Kandemir,et al.  Power-aware partitioned cache architectures , 2001, ISLPED '01.

[100]  Edwin Hsing-Mean Sha,et al.  Full Parallelism in Uniform Nested Loops Using Multi-Dimensional Retiming , 1994, 1994 Internatonal Conference on Parallel Processing Vol. 2.

[101]  Vittorio Zaccaria,et al.  An instruction-level methodology for power estimation and optimization of embedded VLIW cores , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[102]  L. Benini,et al.  A Power Modeling and Estimation Framework for VLIW-based Embedded Systems , 2001 .

[103]  Kemal Ebcioglu,et al.  A study on the number of memory ports in multiple instruction issue machines , 1993, MICRO 1993.

[104]  Bjorn De Sutter,et al.  Compiler techniques for code compaction , 2000, TOPL.

[105]  Edward W. Davis,et al.  A Software Approach to Avoiding Spatial Cache Collisions in Parallel Processor Systems , 1998, IEEE Trans. Parallel Distributed Syst..

[106]  William Jalby,et al.  A Quantitative Algorithm for Data Locality Optimization , 1991, Code Generation.

[107]  Kanad Ghose,et al.  Analytical energy dissipation models for low-power caches , 1997, ISLPED '97.

[108]  William H. Mangione-Smith,et al.  Filtering Memory References to Increase Energy Efficiency , 2000, IEEE Trans. Computers.

[109]  Thijs Krol,et al.  A transformational approach to VHDL and CDFG based high-level synthesis: a case study , 1995, Proceedings of the IEEE 1995 Custom Integrated Circuits Conference.

[110]  N.K. Jha,et al.  Removal of memory access bottlenecks for scheduling control-flow intensive behavioral descriptions , 1998, 1998 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (IEEE Cat. No.98CB36287).

[111]  Miodrag Potkonjak,et al.  Energy minimization with guaranteed quality of service , 2000, ISLPED '00.

[112]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[113]  Lama H. Chandrasena,et al.  A comprehensive analysis of energy savings in dynamic supply voltage scaling systems using data dependent voltage level selection , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[114]  Luca Benini,et al.  Low Power Control Techniques For TFT LCD Displays , 2002, CASES '02.

[115]  Klara Nahrstedt,et al.  R-EDF: a reservation-based EDF scheduling algorithm for multiple multimedia task classes , 2001, Proceedings Seventh IEEE Real-Time Technology and Applications Symposium.

[116]  Peter Marwedel,et al.  Assigning program and data objects to scratchpad for energy reduction , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[117]  Luca Benini,et al.  System-level power optimization: techniques and tools , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[118]  Hugo De Man,et al.  Minimizing the required memory bandwidth in VLSI system realizations , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[119]  Edward A. Lee,et al.  Optimal parenthesization of lexical orderings for DSP block diagrams , 1995, VLSI Signal Processing, VIII.

[120]  Francky Catthoor,et al.  Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design , 1998 .

[121]  Dirk Grunwald,et al.  Predictive sequential associative cache , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[122]  Nikil D. Dutt,et al.  Minimization of Memory Traffic in High-Level Synthesis , 1994, 31st Design Automation Conference.

[123]  Erik Brockmeyer,et al.  Storage Management Programmable Process , 2002 .

[124]  Daniel C. McCrackin Eliminating Interlocks in Deeply Pipelined Processors by Delay Enforced Multistreaming , 1991, IEEE Trans. Computers.

[125]  David B. Loveman,et al.  Program Improvement by Source-to-Source Transformation , 1977, J. ACM.

[126]  Lionel M. Ni,et al.  Dependence Uniformization: A Loop Parallelization Technique , 1993, IEEE Trans. Parallel Distributed Syst..

[127]  Weijia Shang,et al.  Generalized cycle shrinking , 1991, Algorithms and Parallel VLSI Architectures.

[128]  Norman P. Jouppi,et al.  Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.

[129]  Frank Vahid,et al.  Tuning of loop cache architectures to programs in embedded system design , 2002, 15th International Symposium on System Synthesis, 2002..

[130]  Klara Nahrstedt,et al.  A middleware framework coordinating processor/power resource management for multimedia applications , 2001, GLOBECOM'01. IEEE Global Telecommunications Conference (Cat. No.01CH37270).

[131]  Alexandru Nicolau,et al.  Loop Quantization: A Generalized Loop Unwinding Technique , 1988, J. Parallel Distributed Comput..

[132]  Margaret Martonosi,et al.  Characterizing the Memory Behavior of Compiler-Parallelized Applications , 1996, IEEE Trans. Parallel Distributed Syst..

[133]  Dimitrios Soudris,et al.  A code transformation-based methodology for improving I-cache performance of DSP applications , 2002, Proceedings 2002 Design, Automation and Test in Europe Conference and Exhibition.

[134]  William Pugh,et al.  Generating schedules and code within a unified reordering transformation framework , 1992 .

[135]  Ken Kennedy,et al.  The parascope editor: an interactive parallel programming tool , 1993, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[136]  Mahmut T. Kandemir,et al.  Partitioned instruction cache architecture for energy efficiency , 2003, TECS.

[137]  Zhigang Chen,et al.  On Uniformization of Affine Dependence Algorithms , 1996, IEEE Trans. Computers.

[138]  Yike Guo,et al.  Parallelizing Conditional Recurrences , 1996, Euro-Par, Vol. I.

[139]  Luca Benini,et al.  Dynamic frequency scaling with buffer insertion for mixed workloads , 2002, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[140]  Sumedh W. Sathaye,et al.  Instruction fetch mechanisms for VLIW architectures with compressed encodings , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[141]  Lothar Thiele,et al.  On the design of piecewise regular processor arrays , 1989, IEEE International Symposium on Circuits and Systems,.

[142]  Hiroshi Nakamura,et al.  Augmenting Loop Tiling with Data Alignment for Improved Cache Performance , 1999, IEEE Trans. Computers.

[143]  Corinne Ancourt,et al.  Automatic data mapping of signal processing applications , 1997, Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors.