FPGA implementation of real-time image convolutions with three level of memory hierarchy

In this paper, a customized image convolution processor with three level memory hierarchy is implemented on Xilinx VirtexE FPGAs. Due to its fully pipelined datapath for calculations and streamlined data flow architecture, the processor has the performance close to that of TI highest performance C64x processor at less than 1/8 of the clock frequency with substantial I/O bandwidth reductions. Furthermore, potential power savings are envisioned in future ASIC implementations by meaningful memory hierarchy explorations. In addition, a dedicated controller composed of Finite State Machine with incremental branch optimization architecture is developed to control all the operations in calculations and data transfer.