On the comparison between architectures for the implementation of distributed arithmetic

Distributed arithmetic (DA) is used as a method for efficient implementation of inner product computation, where the coefficients of one vector are fixed. Different structures for the implementation of DA are compared. The area-time tradeoff study includes processors based on 1) vectors with N = 4, 8, 16 or 32 variables; 2) four different adder circuits with and without pipelining; and 3) two memory saving techniques. The architectures are implemented in a double metal 1.2-/spl mu/m CMOS technology within a standard cell environment, and are verified by simulations. This allows comparison by means of real values for chip area and computation time.<<ETX>>