VATLD: A Visual Analytics System to Assess, Understand and Improve Traffic Light Detection

Traffic light detection is crucial for environment perception and decision-making in autonomous driving. State-of-the-art detectors are built upon deep Convolutional Neural Networks (CNNs) and have exhibited promising performance. However, one looming concern with CNN based detectors is how to thoroughly evaluate the performance of accuracy and robustness before they can be deployed to autonomous vehicles. In this work, we propose a visual analytics system, VATLD, equipped with a disentangled representation learning and semantic adversarial learning, to assess, understand, and improve the accuracy and robustness of traffic light detectors in autonomous driving applications. The disentangled representation learning extracts data semantics to augment human cognition with human-friendly visual summarization, and the semantic adversarial learning efficiently exposes interpretable robustness risks and enables minimal human interaction for actionable insights. We also demonstrate the effectiveness of various performance improvement strategies derived from actionable insights with our visual analytics system, VATLD, and illustrate some practical implications for safety-critical applications in autonomous driving.

[1]  Jun Yuan,et al.  A survey of visual analytics techniques for machine learning , 2020, Computational Visual Media.

[2]  Wei Zhang,et al.  SCANViz: Interpreting the Symbol-Concept Association Captured by Deep Neural Networks through Visual Analytics , 2020, 2020 IEEE Pacific Visualization Symposium (PacificVis).

[3]  Wei Zhang. , 2020, Angewandte Chemie.

[4]  Ben Shneiderman,et al.  Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy , 2020, Int. J. Hum. Comput. Interact..

[5]  Huamin Qu,et al.  Interpretable and Steerable Sequence Learning via Prototypes , 2019, KDD.

[6]  Vinicius B. Cardoso,et al.  Traffic Light Recognition Using Deep Learning and Prior Maps for Autonomous Cars , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[7]  Wei Zhang,et al.  DeepVID: Deep Visual Interpretation and Diagnosis for Image Classifiers via Knowledge Distillation , 2019, IEEE Transactions on Visualization and Computer Graphics.

[8]  Boqing Gong,et al.  NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks , 2019, ICML.

[9]  Junpeng Wang,et al.  DQNViz: A Visual Analytics Approach to Understand Deep Q-Networks , 2019, IEEE Transactions on Visualization and Computer Graphics.

[10]  Huamin Qu,et al.  RuleMatrix: Visualizing and Understanding Classifiers with Rules , 2018, IEEE Transactions on Visualization and Computer Graphics.

[11]  Logan Engstrom,et al.  Black-box Adversarial Attacks with Limited Queries and Information , 2018, ICML.

[12]  Duen Horng Chau,et al.  ShapeShifter: Robust Physical Adversarial Attack on Faster R-CNN Object Detector , 2018, ECML/PKDD.

[13]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[14]  Theresa-Marie Rhyne,et al.  Visual Analytics for Explainable Deep Learning , 2018, IEEE Computer Graphics and Applications.

[15]  Quanshi Zhang,et al.  Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[16]  Shan Carter,et al.  Using Artificial Intelligence to Augment Human Intelligence , 2017 .

[17]  Kai-Uwe Kühnberger,et al.  Neural-Symbolic Learning and Reasoning: A Survey and Interpretation , 2017, Neuro-Symbolic Artificial Intelligence.

[18]  Xiaoming Liu,et al.  Do Convolutional Neural Networks Learn Class Hierarchy? , 2017, IEEE Transactions on Visualization and Computer Graphics.

[19]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[21]  Karsten Behrendt,et al.  A deep learning approach to traffic lights: Detection, tracking, and classification , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[24]  Jun Zhu,et al.  Towards better analysis of machine learning models: A visual analytics perspective , 2017, Vis. Informatics.

[25]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  LinLin Shen,et al.  Deep Feature Consistent Variational Autoencoder , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[27]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[28]  Zhen Li,et al.  Towards Better Analysis of Deep Convolutional Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[29]  Marco Tulio Ribeiro,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[34]  Antonio Torralba,et al.  Visualizing Object Detection Features , 2015, International Journal of Computer Vision.

[35]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[36]  Zheng Liu,et al.  Traffic light recognition in varying illumination using deep learning and saliency map , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[37]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[38]  Brian C. Ross Mutual Information between Discrete and Continuous Data Sets , 2014, PloS one.

[39]  Christopher Andrews,et al.  The human is the loop: new directions for visual analytics , 2014, Journal of Intelligent Information Systems.

[40]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[41]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Derek Hoiem,et al.  Diagnosing Error in Object Detectors , 2012, ECCV.

[43]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[44]  John Millar Carroll The Nurnberg Funnel: Designing Minimalist Instruction for Practical Computer Skill , 1990 .

[45]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[46]  Yahya Al-Hazmi,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2014, ICPP 2014.