A reconfigurable accelerator for neuromorphic object recognition

Advances in neuroscience have enabled researchers to develop computational models of auditory, visual and learning perceptions in the human brain. HMAX, which is a biologically inspired model of the visual cortex, has been shown to outperform standard computer vision approaches for multi-class object recognition. HMAX, while computationally demanding, can be potentially applied in various applications such as autonomous vehicle navigation, unmanned surveillance and robotics. In this paper, we present a reconfigurable hardware accelerator for the time-consuming S2 stage of the HMAX model. The accelerator leverages spatial parallelism, dedicated wide data buses with on-chip memories to provide an energy efficient solution to enable adoption into embedded systems. We present a systolic array-based architecture which includes a run-time reconfigurable convolution engine which can perform multiple variable-sized convolutions in parallel. An automation flow is described for this accelerator which can generate optimal hardware configurations for a given algorithmic specification and also perform run-time configuration and execution seamlessly. Experimental results on Virtex-6 FPGA platforms show 5X to 11X speedups and 14X to 33X higher performance-per-Watt over a CNS-based implementation on a Tesla GPU.

[1]  Yann LeCun,et al.  CNP: An FPGA-based processor for Convolutional Networks , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[2]  Viktor K. Prasanna,et al.  Parallel object recognition on an FPGA-based configurable computing platform , 1997, Proceedings Fourth IEEE International Workshop on Computer Architecture for Machine Perception. CAMP'97.

[3]  PoggioTomaso,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007 .

[4]  Ryan Kastner,et al.  Field Programmable Gate Array Implementation of Parts-Based Object Detection for Real Time Video Applications , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[5]  David G. Lowe,et al.  University of British Columbia. , 1945, Canadian Medical Association journal.

[6]  Tomaso Poggio,et al.  CNS: a GPU-based framework for simulating cortically-organized networks , 2010 .

[7]  Narayanan Vijaykrishnan,et al.  SHARC: A streaming model for FPGA accelerators and its application to Saliency , 2011, 2011 Design, Automation & Test in Europe.

[8]  Hassab Elgawi Osman Ensemble for high recognition performance FPGA , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[9]  Kunle Olukotun,et al.  The Future of Microprocessors , 2005, ACM Queue.

[10]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[13]  Srihari Cadambi,et al.  A dynamically configurable coprocessor for convolutional neural networks , 2010, ISCA.

[14]  César Torres-Huitzil,et al.  FPGA-Based Configurable Systolic Architecture for Window-Based Image Processing , 2005, EURASIP J. Adv. Signal Process..