论文信息 - Design and implementation of a Radix-100 division unit

Design and implementation of a Radix-100 division unit

This paper presents a Radix-100 divider based on decimal non-restoring and selection by truncation method. Two decimal quotient digits can be selected in each iteration, which can reduce half of the iteration cycles. Initialization is required to scale the divisor into a pre-calculated range, and also used for generating some multiples of the scaled divisor. Implemented with STM 90-nm standard cells library, the proposed architecture takes 14 clock cycles, which is 373 FO4 to reach the desired accuracy. The latency is much shorter than Radix-10 dividers.

Zhuo Wang | Seok-Bum Ko | Liu Han

[1] Malte Baesler,et al. FPGA Implementations of Radix-10 Digit Recurrence Fixed-Point and Floating-Point Dividers , 2011, 2011 International Conference on Reconfigurable Computing and FPGAs.

[2] Eric M. Schwarz,et al. Decimal floating-point support on the IBM System z10 processor , 2009, IBM J. Res. Dev..

[3] A. Weinberger,et al. High Speed Decimal Addition , 1971, IEEE Transactions on Computers.

[4] Michael J. Flynn,et al. Division Algorithms and Implementations , 1997, IEEE Trans. Computers.

[5] Braden Phillips,et al. Fast Decimal Floating-Point Division , 2006, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[6] Alberto Nannarelli. Radix-16 Combined Division and Square Root Unit , 2011, 2011 IEEE 20th Symposium on Computer Arithmetic.

[7] Tomás Lang,et al. Digit-recurrence dividers with reduced logical depth , 2005, IEEE Transactions on Computers.

[8] Tomás Lang,et al. Low-Power Divider , 1999, IEEE Trans. Computers.

[9] Álvaro Vázquez Álvarez. High-performance decimal floating point units , 2009 .

[10] Chin-Long Wey. Design of fast high-radix SRT dividers and their VLSI implementation , 2000 .

[11] Malte Baesler,et al. A radix-10 digit recurrence division unit with a constant digit selection function , 2010, 2010 IEEE International Conference on Computer Design.

[12] James Coke,et al. Improvements in the Intel CoreTM 2 Penryn Processor Family Architecture and Microarchitecture , 2008 .

[13] Tomás Lang,et al. A Radix-10 Digit-Recurrence Division Unit: Algorithm and Architecture , 2007, IEEE Transactions on Computers.

[14] F.Y. Busaba,et al. The IBM z900 decimal arithmetic unit , 2001, Conference Record of Thirty-Fifth Asilomar Conference on Signals, Systems and Computers (Cat.No.01CH37256).

[15] Michael J. Schulte,et al. Decimal floating-point division using Newton-Raphson iteration , 2004, Proceedings. 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004..

[16] I.D. Castellanos,et al. Experiments for Decimal Floating-Point Division by Recurrence , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[17] Michael F. Cowlishaw,et al. Decimal floating-point: algorism for computers , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[18] Peter Kornerup. Revisiting SRT quotient digit selection , 2003, Proceedings 2003 16th IEEE Symposium on Computer Arithmetic.

[19] M. Cowlishaw. Densely packed decimal encoding , 2002 .

[20] Donald E. Knuth. The IBM 650: An Appreciation from the Field , 1986, Annals of the History of Computing.

[21] Amir Kaivani,et al. Improving the speed of decimal division , 2011, IET Comput. Digit. Tech..

[22] Eric M. Schwarz,et al. Power6 Decimal Divide , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[23] Dongdong Chen,et al. Algorithms and architectures for decimal transcendental function computation , 2011 .

[24] Israel Koren. Computer arithmetic algorithms , 1993 .

[25] Michael J. Schulte,et al. Decimal multiplication via carry-save addition , 2003, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors. ASAP 2003.

[26] M. Bayoumi,et al. Algorithms for Energy-Efficient Query-Reduction in Wireless Sensor Networks , 2007, 2006 International Workshop on Computer Architecture for Machine Perception and Sensing.

[27] Paolo Montuschi,et al. A radix-10 SRT divider based on alternative BCD codings , 2007, 2007 25th International Conference on Computer Design.

[28] Michael J. Flynn,et al. Design Issues in Division and Other Floating-Point Operations , 1997, IEEE Trans. Computers.

[29] Nishant R. Srivastava. Radix 4 SRT Division with Quotient Prediction and Operand Scaling , 2007, 2007 Design, Automation & Test in Europe Conference & Exhibition.

[30] Jean-Pierre Deschamps,et al. Decimal division: Algorithms and FPGA implementations , 2010, 2010 VI Southern Programmable Logic Conference (SPL).

[31] Herman H. Goldstine,et al. The Electronic Numerical Integrator and Computer (ENIAC) , 1996, IEEE Ann. Hist. Comput..

[32] Eric M. Schwarz,et al. IBM POWER6 accelerators: VMX and DFU , 2007, IBM J. Res. Dev..

[33] Tomás Lang,et al. Very-High Radix Division with Prescaling and Selection by Rounding , 1994, IEEE Trans. Computers.

[34] James Demmel,et al. IEEE Standard for Floating-Point Arithmetic , 2008 .