An O(1) Time Optimal Algorithm for Multiplying Matrices on Reconfigurable Mesh

The complexity of parallel computations on VLSI has been measured in terms of AT2 product, where A is the VLSI layout area of the design and T is the computation time using area A. It is shown that the lower bound for the computation of two N x N matrices multiplication is AT2 = i1(N4) in the word model of VLSI [91. VLSI architectures for computing the matrix multiplication have been studied extensively during the past decade. In [5], hexagonal systolic arrays have been shown to be optimal, where T = O(N) and A = O( N2>. Also, it is shown that an AT2 optimal VLSI architecture for T E