Foveated Image Processing for Faster Object Detection and Recognition in Embedded Systems Using Deep Convolutional Neural Networks

Object detection and recognition algorithms using deep convolutional neural networks (CNNs) tend to be computationally intensive to implement. This presents a particular challenge for embedded systems, such as mobile robots, where the computational resources tend to be far less than for workstations. As an alternative to standard, uniformly sampled images, we propose the use of foveated image sampling here to reduce the size of images, which are faster to process in a CNN due to the reduced number of convolution operations. We evaluate object detection and recognition on the Microsoft COCO database, using foveated image sampling at different image sizes, ranging from \(416\times 416\) to \(96\times 96\) pixels, on an embedded GPU – an NVIDIA Jetson TX2 with 256 CUDA cores. The results show that it is possible to achieve a \(4{\times }\) speed-up in frame rates, from 3.59 FPS to 15.24 FPS, using \(416\times 416\) and \(128\times 128\) pixel images respectively. For foveated sampling, this image size reduction led to just a small decrease in recall performance in the foveal region, to 92.0% of the baseline performance with full-sized images, compared to a significant decrease to 50.1% of baseline recall performance in uniformly sampled images, demonstrating the advantage of foveated sampling.

[1]  Forrest N. Iandola,et al.  SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[3]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[4]  Teng Gao,et al.  A new deep spatial transformer convolutional neural network for image saliency detection , 2018, Des. Autom. Embed. Syst..

[5]  Miguel P. Eckstein,et al.  Object detection through search with a foveated visual system , 2014, PLoS Comput. Biol..

[6]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[7]  Barry B. Lee,et al.  Temporal frequency and chromatic processing in humans: an fMRI study of the cortical visual areas. , 2011, Journal of vision.

[8]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[9]  Wojciech Matusik,et al.  Learning to Zoom: a Saliency-Based Sampling Layer for Neural Networks , 2018, ECCV.

[10]  I. Rentschler,et al.  Peripheral vision and pattern recognition: a review. , 2011, Journal of vision.

[11]  B. Boycott,et al.  Cortical magnification factor and the ganglion cell density of the primate retina , 1989, Nature.

[12]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Bruno Volckaert,et al.  Embedded Real-Time Object Detection for a UAV Warning System , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[15]  Ze-Nian Li,et al.  Reciprocal-Wedge Transform for Space-Variant Sensing , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Alexander Wong,et al.  Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video , 2017, ArXiv.

[17]  Leopoldo Altamirano Robles,et al.  A New Foveal Cartesian Geometry Approach used for Object Tracking , 2006, SPPRA.

[18]  Alexandre Bernardino,et al.  A review of log-polar imaging for visual perception in robotics , 2010, Robotics and Autonomous Systems.

[19]  Alexandre Bernardino,et al.  Deep Networks for Human Visual Attention: A Hybrid Model Using Foveal Vision , 2017, ROBOT.

[20]  Stewart W. Wilson On the Retino-Cortical Mapping , 1983, Int. J. Man Mach. Stud..

[21]  Simone Frintrop,et al.  Traditional saliency reloaded: A good old model in new shape , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Laurent Itti,et al.  Automatic foveation for video compression using a neurobiological model of visual attention , 2004, IEEE Transactions on Image Processing.