论文信息 - An FPGA-based stream processor for embedded real-time vision with Convolutional Networks

An FPGA-based stream processor for embedded real-time vision with Convolutional Networks

Many recent visual recognition systems can be seen as being composed of multiple layers of convolutional filter banks, interspersed with various types of non-linearities. This includes Convolutional Networks, HMAX-type architectures, as well as systems based on dense SIFT features or Histogram of Gradients. This paper describes a highly-compact and low power embedded system that can run such vision systems at very high speed. A custom board built around a Xilinx Virtex-4 FPGA was built and tested. It measures 70 × 80 mm, and the complete system-FPGA, camera, memory chips, flash-consumes 15 watts in peak, and is capable of more than 4 × 109 multiply-accumulate operations per second in real vision application. This enables real-time implementations of object detection, object recognition, and vision-based navigation algorithms in small-size robots, micro-UAVs, and hand-held devices. Real-time face detection is demonstrated, with speeds of 10 frames per second at VGA resolution.

[1] Yann LeCun,et al. Generalization and network design strategies , 1989 .

[2] Lawrence D. Jackel,et al. An analog neural network processor with programmable topology , 1991 .

[3] Lawrence D. Jackel,et al. Application of the ANNA neural network chip to high-speed character recognition , 1992, IEEE Trans. Neural Networks.

[4] R. Shoup. Parameterized Convolution Filtering in a Field Programmable Gate Array , 1993 .

[5] Will R. Moore,et al. Selected papers from the Oxford 1993 international workshop on field programmable logic and applications on More FPGAs , 1994 .

[6] VIP : An FPGA-based Processor for Image Processingand Neural , 1996 .

[7] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[10] Christophe Garcia,et al. Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Yann LeCun,et al. Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[12] Thomas Serre,et al. Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13] Yann LeCun,et al. Off-Road Obstacle Avoidance through End-to-End Learning , 2005, NIPS.

[14] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15] David G. Lowe,et al. Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[16] Urs A. Muller,et al. A multi-range vision strategy for autonomous offroad navigation , 2007 .

[17] Marc'Aurelio Ranzato,et al. Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Yann LeCun,et al. CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.