A High Throughput Multiplier Design Exploiting Input Based Statistical Distribution in Completion Delays

Design methodologies such as Razor minimize power dissipation by slowing down circuits so as to eliminate timing slacks to the point where occasional timing errors are observed. The main challenge here is the design of efficient mechanisms to detect and recover from these infrequent errors without loss of functionality. We present a design for widely used Wallace multipliers where, because of the highly skewed input based statistical distribution in completion delays, the potential for power and performance gains is significantly higher. Clock periods can be potentially reduced by a factor of 2 or more, with very rare timing errors for random input distributions. For error recovery we present a novel approach that latches and holds logic values at key internal circuit nodes during every clock cycle beyond the next clock edge. This allows generation of the correct outputs for that clock period one clock cycle later in case of a timing error. Meanwhile, very fast error evaluation, exploiting a unique characteristic of carry ripple addition, allows this hold to be quickly released if an error is not detected, ensuring no impact on the circuit timing in error free operation. Our approach is shown to deliver comparable performance to the fastest multipliers at substantially reduced power and hardware costs.

[1]  Kaushik Roy,et al.  CRISTA: A New Paradigm for Low-Power, Variation-Tolerant, and Adaptive Circuit Synthesis Using Critical Path Isolation , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[2]  Dragan Maksimovic,et al.  Closed-loop adaptive voltage scaling controller for standard-cell ASICs , 2002, ISLPED '02.

[3]  Daniel Eckerbert,et al.  Toward architecture-based test-vector generation for timing verification of fast parallel multipliers , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Vojin G. Oklobdzija,et al.  High-Speed VLSI Arithmetic Units: Adders and Multipliers , 1999 .

[5]  Bart R. Zeydel,et al.  Low- and ultra low-power arithmetic units: design and comparison , 2005, 2005 International Conference on Computer Design.

[6]  R. Engelbrecht,et al.  DIGEST of TECHNICAL PAPERS , 1959 .

[7]  Jan M. Rabaey,et al.  Digital Integrated Circuits , 2003 .

[8]  Samar K. Saha,et al.  Modeling Process Variability in Scaled CMOS Technology , 2010, IEEE Design & Test of Computers.

[9]  A. Chandrakasan,et al.  An efficient controller for variable supply-voltage low power processing , 1996, 1996 Symposium on VLSI Circuits. Digest of Technical Papers.

[10]  David Blaauw,et al.  Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation , 2003, MICRO.

[11]  D. H. Jacobsohn,et al.  A Suggestion for a Fast Multiplier , 1964, IEEE Trans. Electron. Comput..

[12]  Vojin G. Oklobdzija,et al.  A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach , 1996, IEEE Trans. Computers.

[13]  David Blaauw,et al.  Bubble Razor: An architecture-independent approach to timing-error detection and correction , 2012, 2012 IEEE International Solid-State Circuits Conference.

[14]  Ted Kehl,et al.  Hardware self-tuning and circuit performance monitoring , 1993, Proceedings of 1993 IEEE International Conference on Computer Design ICCD'93.

[15]  Mary Jane Irwin,et al.  Area-time-power tradeoffs in parallel adders , 1996 .

[16]  Robert C. Aitken,et al.  TIMBER: Time borrowing and error relaying for online timing error resilience , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[17]  Vivek De,et al.  Design and reliability challenges in nanometer technologies , 2004, Proceedings. 41st Design Automation Conference, 2004..

[18]  Adit D. Singh,et al.  Current Sensing Completion Detection for high speed and area efficient arithmetic , 2010, 2010 IEEE Asia Pacific Conference on Circuits and Systems.

[19]  Trevor Mudge,et al.  Razor: a low-power pipeline based on circuit-level timing speculation , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[20]  Gürhan Küçük,et al.  A circuit-level implementation of fast, energy-efficient CMOS comparators for high-performance microprocessors , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[21]  David Blaauw,et al.  Razor II: In Situ Error Detection and Correction for PVT and SER Tolerance , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[22]  Chua-Chin Wang,et al.  A Fast Dynamic 64-bit Comparator with Small Transistor Count , 2002, VLSI Design.

[23]  R.W. Brodersen,et al.  A dynamic voltage scaled microprocessor system , 2000, IEEE Journal of Solid-State Circuits.