论文信息 - Fast scientific computation in CMOS VLSI shared-memory multiprocessors

Fast scientific computation in CMOS VLSI shared-memory multiprocessors

The authors present design considerations for fast and efficient scientific computation in CMOS VLSI in general, and shared memory multiprocessors in particular, using SPUR as a case study. Algorithmic and technological tradeoffs for fast floating-point arithmetic are presented, together with design issues in tightly-coupled coprocessor interfaces. SPUR simulations indicate that basic arithmetic operations are three to ten times faster than current single-chip VLSI floating-point coprocessors, and communication overhead between CPU and FPU in a single-node system is five to ten times less than commercial microprocessor-based systems. System speed-up and potential bottlenecks with shared-memory multiprocessors are presented.<<ETX>>

David A. Patterson | Paul Hansen | B. K. Bose | Corinna G Lee

[1] R. Nave,et al. A numeric data processor , 1980, 1980 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[2] Tomás Lang,et al. A division algorithm with prediction of quotient digits , 1985, 1985 IEEE 7th Symposium on Computer Arithmetic (ARITH).

[3] A. K. Goksel,et al. Design of a Standard Floating Point Chip , 1985, ESSCIRC '85: 11th European Solid-State Circuits Conference.

[4] George S. Taylor,et al. Fast multiply and divide for a VLSI floating-point unit , 1987, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH).

[5] James. Peak vs. Sustained Performance in Highly Concurrent Vector Machines , 1986, Computer.

[6] C. M. Lee,et al. High-speed compact circuits with CMOS , 1982 .

[7] Lev Epstein,et al. The NS32081 Floating-point Unit , 1986, IEEE Micro.

[8] Tack-Don Han,et al. Fast area-efficient VLSI adders , 1987, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH).

[9] George S. Taylor. Radix 16 SRT dividers with overlapped quotient selection stages: A 225 nanosecond double precision divider for the S-1 Mark IIB , 1985, 1985 IEEE 7th Symposium on Computer Arithmetic (ARITH).

[10] G. Wolrich,et al. A high performance floating point coprocessor , 1984, IEEE Journal of Solid-State Circuits.

[11] H. T. Kung,et al. A Regular Layout for Parallel Adders , 1982, IEEE Transactions on Computers.

[12] N. F. Goncalves,et al. NORA: a racefree dynamic CMOS technique for pipelined logic structures , 1983 .

[13] James R. Larus,et al. Design Decisions in SPUR , 1986, Computer.

[14] Daniel E Atkins. THE THEORY AND IMPLEMENTATION OF SRT DIVISION. Report No. 230. , 1967 .