A high throughput FPGA implementation of a bit-level matrix-matrix product

This paper presents a novel architecture for a matrix-matrix multiplication algorithm. The paper describes the mathematical model for the algorithm (based on Baugh-Wooley algorithm), the associated design and implementation of the algorithm on a Xilinx FPGA board, and discusses the efficiency of the implementation requiring (N/sup 2/) and O(2nN) as area and time complexities respectively, where N is the matrix size and n is the word length.