Fast Radix-10 Multiplication Using Redundant BCD Codes

We present the algorithm and architecture of a BCD parallel multiplier that exploits some properties of two different redundant BCD codes to speedup its computation: the redundant BCD excess-3 code (XS-3), and the overloaded BCD representation (ODDS). In addition, new techniques are developed to reduce significantly the latency and area of previous representative high-performance implementations. Partial products are generated in parallel using a signed-digit radix-10 recoding of the BCD multiplier with the digit set [-5, 5], and a set of positive multiplicand multiples (0X, 1X, 2X, 3X, 4X, 5X) coded in XS-3. This encoding has several advantages. First, it is a self-complementing code, so that a negative multiplicand multiple can be obtained by just inverting the bits of the corresponding positive one. Also, the available redundancy allows a fast and simple generation of multiplicand multiples in a carry-free way. Finally, the partial products can be recoded to the ODDS representation by just adding a constant factor into the partial product reduction tree. Since the ODDS uses a similar 4-bit binary encoding as non-redundant BCD, conventional binary VLSI circuit techniques, such as binary carry-save adders and compressor trees, can be adapted efficiently to perform decimal operations. To show the advantages of our architecture, we have synthesized a RTL model for $16\times 16$-digit and $34\times 34$-digit multiplications and performed a comparative survey of the previous most representative designs. We show that the proposed decimal multiplier has an area improvement roughly in the range 20-35 percent for similar target delays with respect to the fastest implementation.

[1]  Michael F. Cowlishaw,et al.  Decimal floating-point: algorism for computers , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[2]  G. N. Srinivasa Prasanna,et al.  On Basic Financial Decimal Operations on Binary Machines , 2012, IEEE Transactions on Computers.

[3]  James Demmel,et al.  IEEE Standard for Floating-Point Arithmetic , 2008 .

[4]  Michael J. Schulte,et al.  A high-frequency decimal multiplier , 2004, IEEE International Conference on Computer Design: VLSI in Computers and Processors, 2004. ICCD 2004. Proceedings..

[5]  R. K. Richards,et al.  Arithmetic operations in digital computers , 2013 .

[6]  Michael J. Schulte,et al.  Decimal multiplication via carry-save addition , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[7]  Ghassem Jaberipur,et al.  Comment on “High Speed Parallel Decimal Multiplication With Redundant Internal Encodings” , 2015, IEEE Transactions on Computers.

[8]  Eric M. Schwarz,et al.  IBM POWER6 accelerators: VMX and DFU , 2007, IBM J. Res. Dev..

[9]  Ghassem Jaberipur,et al.  A fully redundant decimal adder and its application in parallel decimal multipliers , 2009, Microelectron. J..

[10]  R. Sacks-Davis,et al.  Applications of Redundant Number Representations to Decimal Arithmetic , 1982, Comput. J..

[11]  Luigi Dadda Multioperand Parallel Decimal Adder: A Mixed Binary and BCD Approach , 2007, IEEE Transactions on Computers.

[12]  Robert M. Averill,et al.  A radix-8 CMOS S/390 multiplier , 1997, Proceedings 13th IEEE Sympsoium on Computer Arithmetic.

[13]  M. Bayoumi,et al.  Algorithms for Energy-Efficient Query-Reduction in Wireless Sensor Networks , 2007, 2006 International Workshop on Computer Architecture for Machine Perception and Sensing.

[14]  Silvia M. Müller,et al.  The IBM zEnterprise-196 Decimal Floating-Point Accelerator , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[15]  A. Weinberger,et al.  High Speed Decimal Addition , 1971, IEEE Transactions on Computers.

[16]  Amir Kaivani,et al.  Improving the Speed of Parallel Decimal Multiplication , 2009, IEEE Transactions on Computers.

[17]  Eric M. Schwarz,et al.  Decimal floating-point support on the IBM System z10 processor , 2009, IBM J. Res. Dev..

[18]  Seok-Bum Ko,et al.  High-Speed Parallel Decimal Multiplication with Redundant Internal Encodings , 2013, IEEE Transactions on Computers.

[19]  Paolo Montuschi,et al.  A New Family of High.Performance Parallel Decimal Multipliers , 2007, 18th IEEE Symposium on Computer Arithmetic (ARITH '07).

[20]  Antonin Svoboda Decimal Adder with Signed Digit Arithmetic , 1969, IEEE Transactions on Computers.

[21]  Michael J. Schulte,et al.  High-speed multioperand decimal adders , 2005, IEEE Transactions on Computers.

[22]  David Y. Y. Yun,et al.  RBCD: redundant binary coded decimal adder , 1989 .

[23]  Toshio Yoshida,et al.  Sparc64 X: Fujitsu's New-Generation 16-Core Processor for Unix Servers , 2013, IEEE Micro.

[24]  Luigi Dadda,et al.  A variant of a radix-10 combinational multiplier , 2008, 2008 IEEE International Symposium on Circuits and Systems.

[25]  Michael J. Schulte,et al.  Decimal multiplication with efficient partial product generation , 2005, 17th IEEE Symposium on Computer Arithmetic (ARITH'05).

[26]  Eric M. Schwarz,et al.  Power6 Decimal Divide , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[27]  Eric M. Schwarz,et al.  A decimal floating-point specification , 2001, Proceedings 15th IEEE Symposium on Computer Arithmetic. ARITH-15 2001.

[28]  A. Nannarelli,et al.  A Radix-10 Combinational Multiplier , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[30]  Paolo Montuschi,et al.  Improved Design of High-Performance Parallel Decimal Multipliers , 2010, IEEE Transactions on Computers.

[31]  Michael J. Schulte,et al.  A Combined Decimal and Binary Floating-Point Multiplier , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.