A 58.6 mW 30 Frames/s Real-Time Programmable Multiobject Detection Accelerator With Deformable Parts Models on Full HD $1920\times 1080$ Videos

This paper presents a programmable, energy-efficient, and real-time object detection hardware accelerator for low power and high throughput applications using deformable parts models, with <inline-formula> <tex-math notation="LaTeX">$2\times $ </tex-math></inline-formula> higher detection accuracy than traditional rigid body models. Three methods are used to address the high computational complexity of eight deformable parts detection: classification pruning for <inline-formula> <tex-math notation="LaTeX">$33\times $ </tex-math></inline-formula> fewer part classification, vector quantization for <inline-formula> <tex-math notation="LaTeX">$15\times $ </tex-math></inline-formula> memory size reduction, and feature basis projection for <inline-formula> <tex-math notation="LaTeX">$2\times $ </tex-math></inline-formula> reduction in the cost of each classification. The chip was fabricated in a 65 nm CMOS technology, and can process full high definition <inline-formula> <tex-math notation="LaTeX">$1920\times 1080$ </tex-math></inline-formula> videos at 60 frames/s without any OFF-chip storage. The chip has two programmable classification engines (CEs) for multiobject detection. At 30 frames/s, the chip consumes only 58.6 mW (0.94 nJ/pixel, 1168 GOPS/W). At a higher throughput of 60 frames/s, the CEs can be time multiplexed to detect even more than two object classes. This proposed accelerator enables object detection to be as energy-efficient as video compression, which is found in most cameras today.

[1]  Vivienne Sze,et al.  An Energy-Efficient Hardware Implementation of HOG-Based Object Detection at 1080HD 60 fps with Multi-Scale Support , 2016, J. Signal Process. Syst..

[2]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  F. Xavier Roca,et al.  Toward Real-Time Pedestrian Detection Based on a Deformable Template Model , 2014, IEEE Transactions on Intelligent Transportation Systems.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Hoi-Jun Yoo,et al.  A 201.4 GOPS 496 mW Real-Time Multi-Object Recognition Processor With Bio-Inspired Neural Perception Engine , 2009, IEEE Journal of Solid-State Circuits.

[6]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[7]  Naif Alajlan,et al.  A fast object detector based on high-order gradients and Gaussian process regression for UAV images , 2015 .

[8]  Lien-Fei Chen,et al.  A 0.5 nJ/Pixel 4 K H.265/HEVC Codec LSI for Multi-Format Smartphone Applications , 2016, IEEE Journal of Solid-State Circuits.

[9]  Pietro Perona,et al.  The Fastest Pedestrian Detector in the West , 2010, BMVC.

[10]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[11]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Orhan Bulan,et al.  Passenger Compartment Violation Detection in HOV/HOT Lanes , 2016, IEEE Transactions on Intelligent Transportation Systems.

[13]  Yutaka Yamada,et al.  18.2 A 1.9TOPS and 564GOPS/W heterogeneous multicore SoC with color-based object classification accelerator for image-recognition applications , 2015, 2015 IEEE International Solid-State Circuits Conference - (ISSCC) Digest of Technical Papers.

[14]  W. Marsden I and J , 2012 .

[15]  Youchang Kim,et al.  A 2.71 nJ/Pixel Gaze-Activated Object Recognition System for Low-Power Mobile Smart Glasses , 2016, IEEE Journal of Solid-State Circuits.

[16]  David A. Forsyth,et al.  30Hz Object Detection with DPM V5 , 2014, ECCV.

[17]  Tomaso A. Poggio,et al.  Example-Based Object Detection in Images by Components , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[19]  Vivienne Sze,et al.  A 58.6mW real-time programmable object detector with multi-scale multi-object support using deformable parts model on 1920×1080 video at 30fps , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[20]  Bernt Schiele,et al.  Ten Years of Pedestrian Detection, What Have We Learned? , 2014, ECCV Workshops.

[21]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[23]  David A. Forsyth,et al.  Fast Template Evaluation with Vector Quantization , 2013, NIPS.

[24]  Karsten Berns,et al.  Real-time multi-platform pedestrian detection in a heavy duty driver assistance system , 2016 .

[25]  Hoi-Jun Yoo,et al.  A 320mW 342GOPS real-time moving object recognition processor for HD 720p video streams , 2012, 2012 IEEE International Solid-State Circuits Conference.

[26]  S. Sastry,et al.  Vision based terrain recovery for landing unmanned aerial vehicles , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[27]  Tanima Dutta,et al.  A Real-Time Framework for Detection of Long Linear Infrastructural Objects in Aerial Imagery , 2015, ICIAR.

[28]  Hoi-Jun Yoo,et al.  A 320 mW 342 GOPS Real-Time Dynamic Object Recognition Processor for HD 720p Video Streams , 2013, IEEE Journal of Solid-State Circuits.

[29]  Shinpei Kato,et al.  Accelerated Deformable Part Models on GPUs , 2016, IEEE Transactions on Parallel and Distributed Systems.

[30]  Hyunki Kim,et al.  14.2 A 502GOPS and 0.984mW dual-mode ADAS SoC with RNN-FIS engine for intention prediction in automotive black-box system , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[31]  Shintaro Izumi,et al.  A Real-time Scalable Object Detection System using Low-power HOG Accelerator VLSI , 2014, J. Signal Process. Syst..

[32]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Bruce A. Myers,et al.  Embedded Electronics in Electro-Mechanical Systems for Automotive Applications , 2001 .