A scalable counterflow-pipelined asynchronous radix-4 Booth multiplier

This paper introduces an asynchronous radix-4 Booth multiplier architecture, which is scalable to arbitrary operand lengths while maintaining a constant cycle time per Booth iteration. It has several novel features, including: (i) a novel counterflow organization, in which the data bits flow in one direction and the Booth commands piggyback on the acknowledgments flowing in the opposite direction; (ii) overlapped execution of multiple iterations of the Booth algorithm; and (iii) design modularity and bit-level pipelining, which enable the multiplier to be scaled to arbitrary operand widths without requiring gate resizing or cycle time overheads. Spice simulations in a 0.18 /spl mu/m TSMC CMOS process at 1.8 V indicate promising performance: the multiplier takes 640-650 ps per Booth iteration, regardless of the operand widths, thereby demonstrating the scalability of our approach. For 16-bit operands, this performance corresponds to nearly 200 Mops/s throughput. Furthermore, the multiplier is fully functional at reduced supply voltages (e.g., 1.5 V and 1.0 V), and thus capable of dynamically trading off performance for energy efficiency.

[1]  Steven M. Nowick,et al.  Scanning the Technology Applications of Asynchronous Circuits , 1999 .

[2]  V. A. Bartlett,et al.  A low-power concurrent multiplier-accumulator using conditional evaluation , 1999, ICECS'99. Proceedings of ICECS '99. 6th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.99EX357).

[3]  Alain J. Martin Towards an energy complexity of computation , 2001, Inf. Process. Lett..

[4]  Ivan E. Sutherland,et al.  The counterflow pipeline processor architecture , 1994, IEEE Design & Test of Computers.

[5]  Steven M. Nowick,et al.  Applications of asynchronous circuits , 1999, Proc. IEEE.

[6]  Deog-Kyoon Jeong,et al.  A 32/spl times/32 self-timed multiplier with early completion , 1999, AP-ASIC'99. First IEEE Asia Pacific Conference on ASICs (Cat. No.99EX360).

[7]  Hiroyuki Kawai,et al.  3D graphics LSI core for mobile phone "Z3D" , 2003, HWWS '03.

[8]  Ted Eugene Williams,et al.  Self-timed rings and their application to division , 1992 .

[9]  Eby G. Friedman,et al.  System Timing , 2000, The VLSI Handbook.

[10]  Rami Melhem,et al.  Power Aware Computing , 2002, Series in Computer Science.

[11]  Steven M. Nowick,et al.  Fine-grain pipelined asynchronous adders for high-speed DSP applications , 2000, Proceedings IEEE Computer Society Workshop on VLSI 2000. System Design for a System-on-Chip Era.

[12]  Anselmo Lastra,et al.  An area- and energy-efficient asynchronous Booth multiplier for mobile devices , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[13]  Steven M. Nowick,et al.  Asynchronous Circuit Design: Motivation, Background, & Methods , 1995 .

[14]  Jianwei Liu,et al.  Dynamic logic in four-phase micropipelines , 1996, Proceedings Second International Symposium on Advanced Research in Asynchronous Circuits and Systems.

[15]  In-Cheol Park,et al.  An area-efficient iterative modified-Booth multiplier based on self-timed clocking , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[16]  Neil W. Bergmann,et al.  Bundled data asynchronous multipliers with data dependent computation times , 1997, Proceedings Third International Symposium on Advanced Research in Asynchronous Circuits and Systems.

[17]  Chris J. Myers,et al.  A standard-cell self-timed multiplier for energy and area critical synchronous systems , 2001, Proceedings 2001 Conference on Advanced Research in VLSI. ARVLSI 2001.

[18]  Mark Horowitz,et al.  Energy dissipation in general purpose microprocessors , 1996, IEEE J. Solid State Circuits.

[19]  Yingtao Jiang,et al.  On area-efficient low power array multipliers , 2001, ICECS 2001. 8th IEEE International Conference on Electronics, Circuits and Systems (Cat. No.01EX483).

[20]  Aristides Efthymiou,et al.  An asynchronous, iterative implementation of the original Booth multiplication algorithm , 2004, 10th International Symposium on Asynchronous Circuits and Systems, 2004. Proceedings..