Fast scientific computation in CMOS VLSI shared-memory multiprocessors

The authors present design considerations for fast and efficient scientific computation in CMOS VLSI in general, and shared memory multiprocessors in particular, using SPUR as a case study. Algorithmic and technological tradeoffs for fast floating-point arithmetic are presented, together with design issues in tightly-coupled coprocessor interfaces. SPUR simulations indicate that basic arithmetic operations are three to ten times faster than current single-chip VLSI floating-point coprocessors, and communication overhead between CPU and FPU in a single-node system is five to ten times less than commercial microprocessor-based systems. System speed-up and potential bottlenecks with shared-memory multiprocessors are presented.<<ETX>>

[1]  R. Nave,et al.  A numeric data processor , 1980, 1980 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[2]  Tomás Lang,et al.  A division algorithm with prediction of quotient digits , 1985, 1985 IEEE 7th Symposium on Computer Arithmetic (ARITH).

[3]  A. K. Goksel,et al.  Design of a Standard Floating Point Chip , 1985, ESSCIRC '85: 11th European Solid-State Circuits Conference.

[4]  George S. Taylor,et al.  Fast multiply and divide for a VLSI floating-point unit , 1987, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH).

[5]  James Peak vs. Sustained Performance in Highly Concurrent Vector Machines , 1986, Computer.

[6]  C. M. Lee,et al.  High-speed compact circuits with CMOS , 1982 .

[7]  Lev Epstein,et al.  The NS32081 Floating-point Unit , 1986, IEEE Micro.

[8]  Tack-Don Han,et al.  Fast area-efficient VLSI adders , 1987, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH).

[9]  George S. Taylor Radix 16 SRT dividers with overlapped quotient selection stages: A 225 nanosecond double precision divider for the S-1 Mark IIB , 1985, 1985 IEEE 7th Symposium on Computer Arithmetic (ARITH).

[10]  G. Wolrich,et al.  A high performance floating point coprocessor , 1984, IEEE Journal of Solid-State Circuits.

[11]  H. T. Kung,et al.  A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.

[12]  N. F. Goncalves,et al.  NORA: a racefree dynamic CMOS technique for pipelined logic structures , 1983 .

[13]  James R. Larus,et al.  Design Decisions in SPUR , 1986, Computer.

[14]  Daniel E Atkins THE THEORY AND IMPLEMENTATION OF SRT DIVISION. Report No. 230. , 1967 .