High-speed, energy-efficient 2-cycle Multiply-Accumulate architecture

We propose a high-speed and energy-efficient 2-cycle multiply-accumulate (MAC) architecture. Our architecture is based on two's complement representation, it uses guarding bits to efficiently support longer MAC loops, and it includes output saturation. By performing carry propagation only in the second stage of the MAC pipeline, multiplication and accumulation have similar delays. But in contrast to previous MAC architectures that propose to only use one carry-propagation stage, our architecture requires no extra cycles to produce the final result. Instead it correctly produces the sum of the accumulated value and the product in each cycle. Our place-and-route evaluation shows that the proposed architecture, averaged across several operand sizes, offers a 33% improvement in speed and a 37% reduction of energy over a conventional 2-cycle MAC architecture.

[1]  Suhwan Kim,et al.  Fine-grain real-time reconfigurable pipelining , 2003, IBM J. Res. Dev..

[2]  Jianhua Liu,et al.  An Algorithmic Approach for Generic Parallel Adders , 2003, ICCAD 2003.

[3]  Mary Sheeran,et al.  Multiplier reduction tree with logarithmic logic depth and regular connectivity , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[4]  Magnus Själander,et al.  High-speed and low-power multipliers using the Baugh-Wooley algorithm and HPM reduction tree , 2008, 2008 15th IEEE International Conference on Electronics, Circuits and Systems.

[5]  Johann Großschädl,et al.  A single-cycle (32/spl times/32+32+64)-bit multiply/accumulate unit for digital signal processing and public-key cryptography , 2003, 10th IEEE International Conference on Electronics, Circuits and Systems, 2003. ICECS 2003. Proceedings of the 2003.

[6]  B. Bloechel,et al.  A 4-GHz 300-mW 64-bit integer execution ALU with dual supply voltages in 90-nm CMOS , 2004, IEEE Journal of Solid-State Circuits.

[7]  Vojin G. Oklobdzija,et al.  A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach , 1996, IEEE Trans. Computers.

[8]  O. L. Macsorley High-Speed Arithmetic in Binary Computers , 1961, Proceedings of the IRE.

[9]  Chein-Wei Jen,et al.  High-Speed Booth Encoded Parallel Multiplier Design , 2000, IEEE Trans. Computers.

[10]  Mark Horowitz,et al.  SPIM: a pipelined 64*64-bit iterative multiplier , 1989 .

[11]  Bruce A. Wooley,et al.  A Two's Complement Parallel Array Multiplication Algorithm , 1973, IEEE Transactions on Computers.

[12]  Vojin G. Oklobdzija,et al.  Implementing multiply-accumulate operation in multiplication time , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.

[13]  Magdy A. Bayoumi,et al.  High Speed and Area-Efficient Multiply Accumulate (MAC) Unit for Digital Signal Prossing Applications , 2007, 2007 IEEE International Symposium on Circuits and Systems.