FPGA-Based Implementation of a Real-Time Object Recognition System Using Convolutional Neural Network

High computational complexity and power consumption makes convolutional neural networks (CNNs) ineligible for real-time embedded applications. In this brief, we introduce a low power and flexible platform as a hardware accelerator for CNNs. The proposed architecture is fully configurable by a software library so that it can perform different CNN models with a reconfigurable hardware. The hardware accelerator is evaluated on a ZC706 evaluation board. We make use of the AlexNet architecture in a real-time object recognition application to demonstrate the effectiveness of the proposed CNN accelerator. The results show that the performance rates of 198.1 GOP/s using 512 DSP blocks and 23.14 GOP/s using 64 DSP blocks are achievable for the convolution and fully connected layers, respectively. Moreover, images are processed at 82 frames/s, which is significantly higher than existing implementations.