A Deep Learning-Based Hybrid Framework for Object Detection and Recognition in Autonomous Driving

As a key technology of intelligent transportation system, the intelligent vehicle is the carrier of comprehensive integration of many technologies. Although vision-based autonomous driving has shown excellent prospects, there is still a problem of how to analyze the complicated traffic situation by the collected data. Recently, autonomous driving has been formulated as many tasks separately by using different models, such as object detection task and intention recognition task. In this study, a vision-based system was developed to detect and identity various objects and predict the intention of pedestrians in the traffic scene. The main contributions of this research are (1) an optimized model was presented to detect 10 kinds of objects based on the structure of YOLOv4; (2) a fine-tuned Part Affinity Fields approach was proposed to estimate the pose of pedestrians; (3) Explainable Artificial Intelligence (XAI) technology is added to explain and assist the estimation results in the risk assessment phase; (4) an elaborate self-driving dataset that includes several different subsets for each corresponding task was introduced; and (5) an end-to-end system containing multiple models with high accuracy was developed. Experimental results proved that the total parameters of optimized YOLOv4 are reduced by 74%, which satisfies the real-time capability. In addition, the detection precision of the optimized YOLOv4 achieved an improvement of 2.6% compared to the state-of-the-art.

[1]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[3]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[4]  Zhiqiang Shen,et al.  Learning Efficient Convolutional Networks through Network Slimming , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[7]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Adrien Gaidon,et al.  An Attention-based Recurrent Convolutional Network for Vehicle Taillight Recognition , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[9]  Jitendra Malik,et al.  R-CNNs for Pose Estimation and Action Detection , 2014, ArXiv.

[10]  Xiaofeng Liu,et al.  Wasserstein Loss based Deep Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Jian-Gang Wang,et al.  Traffic Light Recognition With High Dynamic Range Imaging and Deep Learning , 2019, IEEE Transactions on Intelligent Transportation Systems.

[12]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[13]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[14]  L. Minh Dang,et al.  Sensor-based and vision-based human activity recognition: A comprehensive survey , 2020, Pattern Recognit..

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[17]  Seyyed Hamed Naghavi,et al.  Integrated real-time object detection for self-driving vehicles , 2017, 2017 10th Iranian Conference on Machine Vision and Image Processing (MVIP).

[18]  Peng Wang,et al.  Appearance based pedestrians' head pose and body orientation estimation using deep learning , 2018, Neurocomputing.

[19]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[20]  Zhaohui Zheng,et al.  Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression , 2019, AAAI.

[21]  Mykel J. Kochenderfer,et al.  Generalizable intention prediction of human drivers at intersections , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[22]  Kate Saenko,et al.  RISE: Randomized Input Sampling for Explanation of Black-box Models , 2018, BMVC.

[23]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Miguel Ángel Sotelo,et al.  Pedestrian Path, Pose, and Intention Prediction Through Gaussian Process Dynamical Models and Pedestrian Activity Recognition , 2019, IEEE Transactions on Intelligent Transportation Systems.

[26]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[27]  Qichao Zhang,et al.  Multi-task learning for dangerous object detection in autonomous driving , 2017, Inf. Sci..

[28]  Antonio M. López,et al.  Intention Recognition of Pedestrians and Cyclists by 2D Pose Estimation , 2019, IEEE Transactions on Intelligent Transportation Systems.

[29]  Tan N. Nguyen,et al.  A novel data-driven nonlinear solver for solid mechanics using time series forecasting , 2020 .

[30]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling , 2018, ArXiv.

[32]  L. Minh Dang,et al.  Smartphone-based bulky waste classification using convolutional neural networks , 2020, Multimedia Tools and Applications.