Power optimization using divide-and-conquer techniques for minimization of the number of operations

We develop an approach to minimizing power consumption of portable wireless DSP applications using a set of compilation and architectural techniques. The key technical innovation is a novel divide-and-conquer compilation technique to minimize the number of operations for general DSP computations. Our technique optimizes not only a significantly wider set of computations than the previously published techniques, but also outperforms (or performs at least as well as other techniques) on all examples. Along the architectural dimension, we investigate coordinated impact of compilation techniques on the number of processors which provide optimal trade-off between cost and power. We demonstrate that proper compilation techniques can significantly reduce power with bounded hardware cost. The effectiveness of all techniques and algorithms is documented on numerous real-life designs.

[1]  Giovanni De Micheli,et al.  High Level Synthesis of ASlCs un - der Timing and Synchronization Constraints , 1992 .

[2]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[3]  Mary Lou Soffa,et al.  An approach to ordering optimizing transformations , 1990, PPOPP '90.

[4]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[5]  Mani B. Srivastava,et al.  Energy efficient programmable computation , 1994, Proceedings of 7th International Conference on VLSI Design.

[6]  Jan M. Rabaey,et al.  Scheduling of DSP programs onto multiprocessors for maximum throughput , 1993, IEEE Trans. Signal Process..

[7]  Niraj K. Jha,et al.  Behavioral synthesis for low power , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[8]  Sharad Malik,et al.  Power analysis of embedded software: a first step towards software power minimization , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[9]  Tsutomu Hirao,et al.  An advanced 0.5 /spl mu/m CMOS/SOI technology for practical ultrahigh-speed and low-power circuits , 1995, 1995 IEEE International SOI Conference Proceedings.

[10]  Jan M. Rabaey,et al.  Maximizing the throughput of high performance DSP applications using behavioral transformations , 1994, Proceedings of European Design and Test Conference EDAC-ETC-EUROASIC.

[11]  Miodrag Potkonjak,et al.  Power optimization using divide-and-conquer techniques for minimization of the number of operations , 1997, ICCAD.

[12]  M. Potkonjak,et al.  Maximally fast and arbitrarily fast implementation of linear computations (circuit layout CAD) , 1992, 1992 IEEE/ACM International Conference on Computer-Aided Design.

[13]  Yuval Tamir Self-checking self-repairing computer nodes using the Mirror Processor , 1992 .

[14]  Anantha Chandrakasan,et al.  Design considerations and tools for low-voltage digital system design , 1996, DAC '96.

[15]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[16]  Jeffrey D. Ullman,et al.  Principles of database and knowledge-base systems, Vol. I , 1988 .

[17]  Miodrag Potkonjak,et al.  Multiple constant multiplications: efficient and versatile framework and algorithms for exploring common subexpression elimination , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[18]  Sharad Malik,et al.  Power analysis of embedded software: a first step towards software power minimization , 1994, ICCAD.

[19]  Jan M. Rabaey,et al.  Exploiting regularity for low-power design , 1996, Proceedings of International Conference on Computer Aided Design.

[20]  Geoffrey C. Fox,et al.  Code Generation by a Generalized Neural Network: General Principles and Elementary Examples , 1989, J. Parallel Distributed Comput..

[21]  Donald C. Cox,et al.  Wireless personal communications: what is it? , 1995, IEEE Wirel. Commun..

[22]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[23]  M. Potkonjak,et al.  Power optimization in programmable processors and ASIC implementations of linear systems: transformation-based approach , 1996, 33rd Design Automation Conference Proceedings, 1996.

[24]  Miodrag Potkonjak,et al.  Fast prototyping of datapath-intensive architectures , 1991, IEEE Design & Test of Computers.

[25]  Massoud Pedram,et al.  Power conscious CAD tools and methodologies: a perspective , 1995, Proc. IEEE.

[26]  Edwin Hsing-Mean Sha,et al.  Global node reduction of linear systems using ratio analysis , 1994, Proceedings of 7th International Symposium on High-Level Synthesis.

[27]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[28]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[29]  Sharad Malik,et al.  Power analysis and minimization techniques for embedded DSP software , 1997, IEEE Trans. Very Large Scale Integr. Syst..

[30]  Miodrag Potkonjak,et al.  Critical Path Minimization Using Retiming and Algebraic Speed-Up , 1993, 30th ACM/IEEE Design Automation Conference.

[31]  Jan M. Rabaey,et al.  Exploiting regularity for low-power design , 1996, ICCAD 1996.

[32]  Ron Schneiderman Wireless Personal Communications: The Future of Talk , 1994 .

[33]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[34]  Keshab K. Parhi,et al.  High-level algorithm and architecture transformations for DSP synthesis , 1995, J. VLSI Signal Process..

[35]  Miodrag Potkonjak,et al.  Performance optimization of sequential circuits by eliminating retiming bottlenecks , 1992, 1992 IEEE/ACM International Conference on Computer-Aided Design.

[36]  Miodrag Potkonjak,et al.  Optimizing power using transformations , 1995, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[37]  Jan M. Rabaey,et al.  Activity-sensitive architectural power analysis , 1996, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[38]  Anantha P. Chandrakasan,et al.  Low-power CMOS digital design , 1992 .

[39]  Andrew S. Tanenbaum,et al.  Using Peephole Optimization on Intermediate Code , 1982, TOPL.

[40]  Miodrag Potkonjak,et al.  Transforming linear systems for joint latency and throughput optimization , 1994, Proceedings of European Design and Test Conference EDAC-ETC-EUROASIC.

[41]  Rudolf Eigenmann,et al.  Automatic program parallelization , 1993, Proc. IEEE.

[42]  Edward A. Lee,et al.  Dataflow process networks , 1995, Proc. IEEE.

[43]  Mike Tien-Chien Lee,et al.  Power analysis of a 32-bit embedded microcontroller , 1995, Proceedings of ASP-DAC'95/CHDL'95/VLSI'95 with EDA Technofair.

[44]  Edward A. Lee,et al.  A scheduling framework for minimizing memory requirements of multirate DSP systems represented as dataflow graphs , 1993, Proceedings of IEEE Workshop on VLSI Signal Processing.

[45]  Miodrag Potkonjak,et al.  Power optimization in programmable processors and ASIC implementations of linear systems: transformation-based approach , 1996, DAC '96.

[46]  Keshab K. Parhi,et al.  Static Rate-Optimal Scheduling of Iterative Data-Flow Programs via Optimum Unfolding , 1991, IEEE Trans. Computers.

[47]  Henry Massalin Superoptimizer: a look at the smallest program , 1987, ASPLOS 1987.

[48]  Miodrag Potkonjak,et al.  System-level design guidance using algorithm properties , 1994, Proceedings of 1994 IEEE Workshop on VLSI Signal Processing.

[49]  Alex Orailoglu,et al.  Microarchitectural synthesis of performance-constrained, low-power VLSI designs , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[50]  Timothy Daryl Stanley,et al.  Silicon on insulator-an emerging high-leverage technology , 1994 .

[51]  Miodrag Potkonjak,et al.  Performance optimization of sequential circuits by eliminating retiming bottlenecks , 1992, ICCAD.

[52]  Mark N. Wegman,et al.  Constant propagation with conditional branches , 1985, POPL.

[53]  Robert A. Walker,et al.  A Survey of high-level synthesis systems , 1991 .

[54]  Miodrag Potkonjak,et al.  Maximally fast and arbitrarily fast implementation of linear computations , 1992, ICCAD '92.