Cost/performance tradeoff of n-select square root implementations

Hardware square-root units require large numbers of gates even for iterative implementations. In this paper we present four low-cost high-performance fully-pipelined n-select implementations (nS-Root) based on a non-restoring-remainder square root algorithm. The nS-Root uses a parallel array of carry-save adders (CSAs). For a square root bit calculation, a CSA is used once. This means that the calculations can be fully pipelined. It also uses the n-way root-select technique to speedup the square root calculation. The cost/performance evaluation shows that n=2 or n=2.5 is a suitable solution for designing a high-speed fully pipelined square root unit while keeping the low-cost.

[1]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[2]  Günter Knittel A VLSI-design for fast vector normalization , 1995, Comput. Graph..

[3]  Tomás Lang,et al.  Higher Radix Square Root with Prescaling , 1992, IEEE Trans. Computers.

[4]  Tomás Lang,et al.  Very-high radix combined division and square root with prescaling and selection by rounding , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[5]  Kenneth C. Johnson Algorithm 650: Efficient square root implementation on the 68000 , 1987, TOMS.

[6]  A. Varma,et al.  The VLSI implementation of a square root algorithm , 1985, 1985 IEEE 7th Symposium on Computer Arithmetic (ARITH).

[7]  Stanislaw Majerski Square-Rooting Algorithms for High-Speed Digital Circuits , 1985, IEEE Transactions on Computers.

[8]  Jason Hickey,et al.  Non-Restoring Integer Square Root: A Case Study in Design by Principled Optimization , 1994, TPCD.

[9]  Tomás Lang,et al.  Module to Perform Multiplication, Division, and Square Root in Systolic Arrays for Matrix Computations , 1991, J. Parallel Distributed Comput..

[10]  John Barnes,et al.  Developing the WTL3170/3171 Sparc floating-point coprocessors , 1990, IEEE Micro.

[11]  Jan Fandrianto Algorithm for high speed shared radix 8 division and radix 8 square root , 1989, Proceedings of 9th Symposium on Computer Arithmetic.

[12]  Gregory B. Zyner,et al.  167 MHz radix-8 divide and square root using overlapped radix-2 stages , 1995, Proceedings of the 12th Symposium on Computer Arithmetic.

[13]  Yamin Li,et al.  Implementation of single precision floating point square root on FPGAs , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[14]  Yamin Li,et al.  A new non-restoring square root algorithm and its VLSI implementations , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[15]  Akira Miyoshi,et al.  Accurate Ronding Scheme for the Newton-Raphson Method Using Redundant Binary Representation , 1994, IEEE Trans. Computers.

[16]  Peter W. Markstein Computation of Elementary Functions on the IBM RISC System/6000 Processors , 1990, IBM J. Res. Dev..

[17]  C. V. Ramamoorthy,et al.  Some Properties of Iterative Square-Rooting Methods Using High-Speed Multiplication , 1972, IEEE Transactions on Computers.

[18]  Mark Johnson,et al.  The MIPS R3010 floating-point coprocessor , 1988, IEEE Micro.