Fast hardware units for the computation of accurate dot products

The principle of operation of fast hardware units for the computation of accurate dot products is described, and a hardware description is provided. The circuits presented are easy to implement with available techniques as a single chip and deliver a high-performance solution for dot product computations. Besides the RAM for the long accumulator, all units can be used for scalar operations to avoid a hardware overhead for scalar and vector units. The additional hardware amount for a combined scalar and vector computation unit is about 120 K transistors and therefore is also applicable to PCs. The entire dataflow from the input interface down to the accumulation is handled. The pipeline of the first unit has 24 stages and the pipeline of the second has only six stages for the entire process. Thus, these units are also applicable to short vectors and computations with complex numbers.<<ETX>>

[1]  Willard L. Miranker,et al.  Computer arithmetic in theory and practice , 1981, Computer science and applied mathematics.

[2]  Ulrich W. Kulisch,et al.  Arithmetic for vector processors , 1987, 1987 IEEE 8th Symposium on Computer Arithmetic (ARITH).

[3]  Peter R. Cappello,et al.  Systolic Super Summation , 1988, IEEE Trans. Computers.