Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection

Neural Architecture Search (NAS) has achieved great success in image classification task. Some recent works have managed to explore the automatic design of efficient backbone or feature fusion layer for object detection. However, these methods focus on searching only one certain component of object detector while leaving others manually designed. We identify the inconsistency between searched component and manually designed ones would withhold the detector of stronger performance. To this end, we propose a hierarchical trinity search framework to simultaneously discover efficient architectures for all components (\ie backbone, neck, and head) of object detector in an end-to-end manner. In addition, we empirically reveal that different parts of the detector prefer different operators. Motivated by this, we employ a novel scheme to automatically screen different sub search spaces for different components so as to perform the end-to-end search for each component on the corresponding sub search space efficiently. Without bells and whistles, our searched architecture, namely Hit-Detector, achieves 41.4\% mAP on COCO minival set with 27M parameters. Our implementation is available at \href{https://github.com/ggjy/HitDet.pytorch}{https://github.com/ggjy/HitDet.pytorch}.

[1]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Tieniu Tan,et al.  Efficient Neural Architecture Transformation Searchin Channel-Level for Object Detection , 2019, NeurIPS.

[3]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[4]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[6]  Xiangyu Zhang,et al.  DetNAS: Neural Architecture Search on Object Detection , 2019, ArXiv.

[7]  Xiangyu Zhang,et al.  Light-Head R-CNN: In Defense of Two-Stage Object Detector , 2017, ArXiv.

[8]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[11]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Oriol Vinyals,et al.  Hierarchical Representations for Efficient Architecture Search , 2017, ICLR.

[13]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Hang Xu,et al.  Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[16]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[17]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Qi Tian,et al.  CARS: Continuous Evolution for Efficient Neural Architecture Search , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Gang Niu,et al.  Semi-Supervised Classification Based on Classification from Positive and Unlabeled Data , 2016, ICML.

[20]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[22]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ramesh Raskar,et al.  Designing Neural Network Architectures using Reinforcement Learning , 2016, ICLR.

[25]  Liang Lin,et al.  SNAS: Stochastic Neural Architecture Search , 2018, ICLR.

[26]  Tong Yang,et al.  MetaAnchor: Learning to Detect Objects with Customized Anchors , 2018, NeurIPS.

[27]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[28]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Chao Xu,et al.  LegoNet: Efficient Convolutional Neural Networks with Lego Filters , 2019, ICML.

[30]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Alok Aggarwal,et al.  Regularized Evolution for Image Classifier Architecture Search , 2018, AAAI.

[33]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Yonggang Hu,et al.  Transferable AutoML by Model Sharing Over Grouped Datasets , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[37]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Zhaoxiang Zhang,et al.  Scale-Aware Trident Networks for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[41]  Quoc V. Le,et al.  NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Xiangyu Zhang,et al.  Single Path One-Shot Neural Architecture Search with Uniform Sampling , 2019, ECCV.

[43]  Qi Tian,et al.  Progressive Differentiable Architecture Search: Bridging the Depth Gap Between Search and Evaluation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[45]  Shifeng Zhang,et al.  Single-Shot Refinement Neural Network for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[48]  Quoc V. Le,et al.  Large-Scale Evolution of Image Classifiers , 2017, ICML.

[49]  Li Fei-Fei,et al.  Progressive Neural Architecture Search , 2017, ECCV.

[50]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[52]  Seo-Young Noh,et al.  AmoebaNet: An SDN-enabled network service for big data science , 2018, J. Netw. Comput. Appl..

[53]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[54]  Xiangyu Zhang,et al.  DetNet: A Backbone network for Object Detection , 2018, ArXiv.

[55]  Yichen Wei,et al.  Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[57]  Elliot Meyerson,et al.  Evolving Deep Neural Networks , 2017, Artificial Intelligence in the Age of Neural Networks and Brain Computing.

[58]  Jianyuan Guo,et al.  GhostNet: More Features From Cheap Operations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Yong Yu,et al.  Efficient Architecture Search by Network Transformation , 2017, AAAI.