A high-performance low-power mesochronous pipeline architecture for computer systems

In a conventional pipeline scheme each pipeline stage operates on only one data set at a time. The clock period in conventional pipeline scheme is proportional to the maximum pipeline stage delay. We propose a mesochronous pipeline scheme, where pipeline stages operate on multiple data sets simultaneously. In this scheme the amount of logic in a stage is more and number of stages is less compared to a conventional pipeline. The clock period in this scheme is proportional to the maximum pipeline stage delay difference, which means higher clock speeds are possible and number of pipeline stages is significantly less. In mesochronous pipeline scheme, clock distribution network is simple and load on it is less. A detailed analysis of the clock period constraints is provided to show the performance gain and Speedup of mesochronous pipelining over other pipelining schemes. In mesochronous pipeline scheme, overall current drawn is less, resulting in significant power savings and also less IR drop on power lines. Also, the variation in supply current (di/dt) drawn by clock network is significantly less in mesochronous scheme, thus power supply noise is less. An 8×8-bit multiplier using carry-save adder technique has been simulated in conventional and mesochronous pipeline approach using TSMC 180nm (drawn length 200nm). The mesochronous pipelined multiplier is able to operate on a clock period of 350ps (2.86GHz). This is a Speedup of 1.7 over conventional pipeline scheme and requires fewer pipeline stages and pipeline registers. The over-all power dissipation in mesochronous pipeline multiplier is less than 50% of the power dissipation in conventional pipeline multiplier. In the conventional implementation, power dissipation in clock network and pipeline registers is close to 80% of total power dissipation, while in the mesochronous implementation logic is dissipating more power. Also, the variation in current drawn by clock network in mesochronous scheme is less, causing less power supply noise.

[1]  Eby G. Friedman,et al.  Clock distribution networks in synchronous digital integrated circuits , 2001, Proc. IEEE.

[2]  David Blaauw,et al.  Analysis and reduction of on-chip inductance effects in power supply grids , 2004, International Symposium on Signals, Circuits and Systems. Proceedings, SCS 2003. (Cat. No.03EX720).

[3]  W. Liu,et al.  Wave-pipelining: a tutorial and research survey , 1998, IEEE Trans. Very Large Scale Integr. Syst..

[4]  José G. Delgado-Frias,et al.  Decoupled dynamic ternary content addressable memories , 2005, IEEE Transactions on Circuits and Systems I: Regular Papers.

[5]  J.G. Delgado-Frias,et al.  Designing pipelined systems with a clock period approaching pipeline register delay , 2005, 48th Midwest Symposium on Circuits and Systems, 2005..

[6]  Valeriu Beiu,et al.  Split-Precharge Differential Noise-Immune Threshold Logic Gate (SPD-NTL) , 2003, IWANN.

[7]  José G. Delgado-Frias,et al.  A Pipelined Multiplier Using A Hybrid Wave-Pipelining Scheme , 2005, CDES.

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[9]  Vladimir Stojanovic,et al.  Comparative analysis of master-slave latches and flip-flops for high-performance and low-power systems , 1999, IEEE J. Solid State Circuits.

[10]  José G. Delgado-Frias,et al.  A hybrid wave pipelined network router , 2002 .

[11]  Nihar R. Mahapatra,et al.  An empirical and analytical comparison of delay elements and a new delay element design , 2000, Proceedings IEEE Computer Society Workshop on VLSI 2000. System Design for a System-on-Chip Era.

[12]  Ashutosh Das,et al.  A new family of semidynamic and dynamic flip-flops with embedded logic for high-performance processors , 1999 .

[13]  José G. Delgado-Frias,et al.  A mesochronous pipelining scheme for high-performance digital systems , 2006, IEEE Transactions on Circuits and Systems I: Regular Papers.

[14]  Manoj Sachdev,et al.  A digitally programmable delay element: design and analysis , 2003, IEEE Trans. Very Large Scale Integr. Syst..

[15]  Alina Deutsch,et al.  Designing the best clock distribution network , 1998, 1998 Symposium on VLSI Circuits. Digest of Technical Papers (Cat. No.98CH36215).

[16]  José G. Delgado-Frias,et al.  A mesochronous pipeline scheme for high performance low power digital systems , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[17]  J.G. Delgado-Frias,et al.  A Reduced Clock Delay Approach for High Performance Mesochronous Pipeline , 2006, 2006 49th IEEE International Midwest Symposium on Circuits and Systems.

[18]  Narayanan Vijaykrishnan,et al.  A clock power model to evaluate impact of architectural and technology optimizations , 2002, IEEE Trans. Very Large Scale Integr. Syst..

[19]  V. Beiu,et al.  A charge recycling differential noise immune perceptron , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[20]  Kaushik Roy,et al.  Estimation of inductive and resistive switching noise on power supply network in deep sub-micron CMOS circuits , 2000, Proceedings 2000 International Conference on Computer Design.

[21]  Sydney S. Weinstein,et al.  Flip/flop , 1993 .

[22]  S. Tam,et al.  Clock generation and distribution for the 130-nm Itanium/sup /spl reg// 2 processor with 6-MB on-die L3 cache , 2004, IEEE Journal of Solid-State Circuits.

[23]  Wentai Liu,et al.  Timing constraints for wave-pipelined systems , 1994, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..