A Real-Time 17-Scale Object Detection Accelerator With Adaptive 2000-Stage Classification in 65 nm CMOS

Machine learning has become ubiquitous in applications including object detection, image/video classification, and natural language processing. While machine learning algorithms have been successfully used in many practical applications, accurate, fast, and low-power hardware implementations of such algorithms is still a challenging task, especially for mobile systems such as Internet of Things (IoT), autonomous vehicles, and smart drones. This paper presents an energy-efficient programmable ASIC accelerator for object detection. Our ASIC accelerator supports multi-class (e.g., face, traffic sign, car license plate, and pedestrian) that are programmable, many-object (up to 50) in one image with different sizes (17-scale support with 6 down-/11 up-scaling), and high accuracy (AP of 0.87/0.81/0.72/0.76 for FDDB/AFW/BTSD/Caltech datasets). We designed an integral channel detector with 2,000 classifiers for rigid boosted templates, where the number of stages used for classification can be adaptively controlled depending on the content of the search window. This can be implemented with a more modular hardware, compared to support vector machine (SVM) and deformable parts model (DPM) designs. By jointly optimizing the algorithm and the efficient hardware architecture, the prototype chip implemented in 65nm CMOS demonstrates real-time object detection of 20–50 frames/s with low power consumption of 22.5–181.7 mW (0.54–1.75 nJ/pixel) at 0.58–1.1 V supply.

[1]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[2]  Narayanan Vijaykrishnan,et al.  A scalable architecture for multi-class visual object detection , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[3]  Rama Chellappa,et al.  HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Luc Van Gool,et al.  Pedestrian detection at 100 frames per second , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  David Blaauw,et al.  A 23mW face recognition accelerator in 40nm CMOS with mostly-read 5T memory , 2015, 2015 Symposium on VLSI Circuits (VLSI Circuits).

[8]  Leibo Liu,et al.  A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications , 2018, IEEE Journal of Solid-State Circuits.

[9]  V. Sze,et al.  A 58 . 6 mW Real-Time Programmable Object Detector with Multi-Scale Multi-Object Support Using Deformable Parts Model on 1920 x 1080 Video at 30 fps , 2016 .

[10]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Vivienne Sze,et al.  A 58.6 mW 30 Frames/s Real-Time Programmable Multiobject Detection Accelerator With Deformable Parts Models on Full HD $1920\times 1080$ Videos , 2017, IEEE Journal of Solid-State Circuits.

[12]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[13]  Vivienne Sze,et al.  A 58.6mW real-time programmable object detector with multi-scale multi-object support using deformable parts model on 1920×1080 video at 30fps , 2016, 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits).

[14]  Hyunki Kim,et al.  14.2 A 502GOPS and 0.984mW dual-mode ADAS SoC with RNN-FIS engine for intention prediction in automotive black-box system , 2016, 2016 IEEE International Solid-State Circuits Conference (ISSCC).

[15]  Erik Learned-Miller,et al.  FDDB: A benchmark for face detection in unconstrained settings , 2010 .

[16]  Horst Bischof,et al.  Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[17]  Yu Cao,et al.  A real-time 17-scale object detection accelerator with adaptive 2000-stage classification in 65nm CMOS , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[18]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[19]  Shuo Yang,et al.  Faceness-Net: Face Detection through Deep Facial Part Responses , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Shintaro Izumi,et al.  A Real-time Scalable Object Detection System using Low-power HOG Accelerator VLSI , 2014, J. Signal Process. Syst..

[21]  Pietro Perona,et al.  Integral Channel Features , 2009, BMVC.

[22]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Vivienne Sze,et al.  Towards closing the energy gap between HOG and CNN features for embedded vision (Invited paper) , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[24]  Luc Van Gool,et al.  Multi-view traffic sign detection, recognition, and 3D localisation , 2014, 2009 Workshop on Applications of Computer Vision (WACV).

[25]  Vivienne Sze,et al.  An Energy-Efficient Hardware Implementation of HOG-Based Object Detection at 1080HD 60 fps with Multi-Scale Support , 2016, J. Signal Process. Syst..

[26]  Xiaobai Chen,et al.  A 68-mw 2.2 Tops/w Low Bit Width and Multiplierless DCNN Object Detection Processor for Visually Impaired People , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.