论文信息 - Cost/performance tradeoff of n-select square root implementations

Cost/performance tradeoff of n-select square root implementations

Hardware square-root units require large numbers of gates even for iterative implementations. In this paper we present four low-cost high-performance fully-pipelined n-select implementations (nS-Root) based on a non-restoring-remainder square root algorithm. The nS-Root uses a parallel array of carry-save adders (CSAs). For a square root bit calculation, a CSA is used once. This means that the calculations can be fully pipelined. It also uses the n-way root-select technique to speedup the square root calculation. The cost/performance evaluation shows that n=2 or n=2.5 is a suitable solution for designing a high-speed fully pipelined square root unit while keeping the low-cost.

Yamin Li | Wanming Chu | Yamin Li | Wanming Chu

[1] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[2] Günter Knittel. A VLSI-design for fast vector normalization , 1995, Comput. Graph..

[3] Tomás Lang,et al. Higher Radix Square Root with Prescaling , 1992, IEEE Trans. Computers.

[4] Tomás Lang,et al. Very-high radix combined division and square root with prescaling and selection by rounding , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[5] Kenneth C. Johnson. Algorithm 650: Efficient square root implementation on the 68000 , 1987, TOMS.

[6] A. Varma,et al. The VLSI implementation of a square root algorithm , 1985, 1985 IEEE 7th Symposium on Computer Arithmetic (ARITH).

[7] Stanislaw Majerski. Square-Rooting Algorithms for High-Speed Digital Circuits , 1985, IEEE Transactions on Computers.

[8] Jason Hickey,et al. Non-Restoring Integer Square Root: A Case Study in Design by Principled Optimization , 1994, TPCD.

[9] Tomás Lang,et al. Module to Perform Multiplication, Division, and Square Root in Systolic Arrays for Matrix Computations , 1991, J. Parallel Distributed Comput..

[10] John Barnes,et al. Developing the WTL3170/3171 Sparc floating-point coprocessors , 1990, IEEE Micro.

[11] Jan Fandrianto. Algorithm for high speed shared radix 8 division and radix 8 square root , 1989, Proceedings of 9th Symposium on Computer Arithmetic.

[12] Gregory B. Zyner,et al. 167 MHz radix-8 divide and square root using overlapped radix-2 stages , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[13] Yamin Li,et al. Implementation of single precision floating point square root on FPGAs , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[14] Yamin Li,et al. A new non-restoring square root algorithm and its VLSI implementations , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[15] Akira Miyoshi,et al. Accurate Ronding Scheme for the Newton-Raphson Method Using Redundant Binary Representation , 1994, IEEE Trans. Computers.

[16] Peter W. Markstein. Computation of Elementary Functions on the IBM RISC System/6000 Processors , 1990, IBM J. Res. Dev..

[17] C. V. Ramamoorthy,et al. Some Properties of Iterative Square-Rooting Methods Using High-Speed Multiplication , 1972, IEEE Transactions on Computers.

[18] Mark Johnson,et al. The MIPS R3010 floating-point coprocessor , 1988, IEEE Micro.