Power optimization in heterogenous datapaths

Heterogenous datapaths maximize the utilization of functional units (FUs) by customizing their widths individually through fragmentation of wide operands. In comparison, slices in large functional units in a homogenous datapath could be spending many cycles not performing actual useful work. Various fragmentation techniques demonstrated benefits in minimizing the total functional unit area. Upon a closer look at fragmentation techniques, we observe that the area savings achieved by heterogenous datapaths can be traded-off for power optimization. Our specific approach is to introduce choices for functional units with power/area trade-offs for different fragmentation and allocation choices, for reducing power consumption while satisfying the area constraint imposed on the heterogenous datapath. As low power FUs in literature produce an area penalty, a methodology must be developed in order to introduce them in the HLS flow while complying with the area constraint. We propose an allocation and module selection algorithms that pursue a trade-off between area and power consumption for fragmented datapaths under a total area constraint. Results show that it is possible to reduce power by 37% on average (49% in the best case). Moreover latency and cycle time will be equal or nearly the same as in the baseline case, which will lead to an energy reduction, too.

[1]  Román Hermida,et al.  Restricted Chaining and Fragmentation Techniques in Power Aware High Level Synthesis , 2008, 2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools.

[2]  Srinivas Devadas,et al.  Algorithms for hardware allocation in data path synthesis , 1989, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[3]  Pierre G. Paulin,et al.  Force-directed scheduling for the behavioral synthesis of ASICs , 1989, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[4]  Saeid Nooshabadi,et al.  High performance low power CMOS dynamic logic for arithmetic circuits , 2007, Microelectron. J..

[5]  Gilles Brassard,et al.  Fundamentals of Algorithmics , 1995 .

[6]  Sying-Jyan Wang,et al.  Low power parallel multiplier with column bypassing , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[7]  Kaushik Roy,et al.  Technology scaling behavior of optimum reverse body bias for standby leakage power reduction in CMOS IC's , 1999, Proceedings. 1999 International Symposium on Low Power Electronics and Design (Cat. No.99TH8477).

[8]  George Economakos,et al.  Bit level architectural exploration technique for the design of low power multipliers , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[9]  Octavio Nieto-Taladriz,et al.  High-Level Synthesis of Multiple Word-Length DSP Algorithms Using Heterogeneous-Resource FPGAs , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[10]  Román Hermida,et al.  Bitwise scheduling to balance the computational cost of behavioral specifications , 2006, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  M. Chinnadurai,et al.  HIGH LEVEL SYNTHESIS , 2011 .

[12]  Bashir M. Al-Hashimi,et al.  Exploiting Power-Area Tradeoffs in Behavioural Synthesis through clock and operations throughput selection , 2007, 2007 Asia and South Pacific Design Automation Conference.

[13]  Roman Hermida,et al.  Behavioural specifications allocation to minimise bit level waste of functional units , 2003 .

[14]  Gang-Neng Sung,et al.  Low-Power Multiplier Design Using a Bypassing Technique , 2009, J. Signal Process. Syst..

[15]  Jordi Cortadella,et al.  High-level synthesis techniques for reducing the activity of functional units , 1995, ISLPED '95.

[16]  Ranga Vemuri,et al.  A novel synthesis strategy driven by partial evaluation based circuit reduction for application specific DSP circuits , 2003, Proceedings 21st International Conference on Computer Design.

[17]  Luca Benini,et al.  A survey of design techniques for system-level dynamic power management , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[18]  Ko-Chi Kuo,et al.  Low Power Multiplier with Bypassing and Tree Strucuture , 2006, APCCAS 2006 - 2006 IEEE Asia Pacific Conference on Circuits and Systems.

[19]  Israel Koren Computer arithmetic algorithms , 1993 .

[20]  Lan-Rong Dung,et al.  On multiple-voltage high-level synthesis using algorithmic transformations , 2005, Proceedings of the ASP-DAC 2005. Asia and South Pacific Design Automation Conference, 2005..

[21]  Eby G. Friedman,et al.  Low swing dual threshold voltage domino logic , 2002, GLSVLSI '02.

[22]  M. Dehyadgari,et al.  Multiplier for Correlative Input Patterns , 2005, 2005 International Conference on Microelectronics.

[23]  Saraju P. Mohanty,et al.  Low-Power High-Level Synthesis for Nanoscale CMOS Circuits , 2008 .

[24]  Saraju P. Mohanty,et al.  Energy-efficient datapath scheduling using multiple voltages and dynamic clocking , 2005, TODE.