Comparing Offline and Online Testing of Deep Neural Networks: An Autonomous Car Case Study

There is a growing body of research on developing testing techniques for Deep Neural Networks (DNNs). We distinguish two general modes of testing for DNNs: Offline testing where DNNs are tested as individual units based on test datasets obtained independently from the DNNs under test, and online testing where DNNs are embedded into a specific application and tested in a close-loop mode in interaction with the application environment. In addition, we identify two sources for generating test datasets for DNNs: Datasets obtained from real-life and datasets generated by simulators. While offline testing can be used with datasets obtained from either sources, online testing is largely confined to using simulators since online testing within real-life applications can be time consuming, expensive and dangerous. In this paper, we study the following two important questions aiming to compare test datasets and testing modes for DNNs: First, can we use simulator-generated data as a reliable substitute to real-world data for the purpose of DNN testing? Second, how do online and offline testing results differ and complement each other? Though these questions are generally relevant to all autonomous systems, we study them in the context of automated driving systems where, as study subjects, we use DNNs automating end-to-end control of cars’ steering actuators. Our results show that simulator-generated datasets are able to yield DNN prediction errors that are similar to those obtained by testing DNNs with real-life datasets. Further, offline testing is more optimistic than online testing as many safety violations identified by online testing could not be identified by offline testing, while large prediction errors generated by offline testing always led to severe safety violations detectable by online testing.

[1]  Jérémie Guiochet,et al.  Can Robot Navigation Bugs Be Found in Simulation? An Exploratory Study , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[2]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[4]  Matthew Wicker,et al.  Feature-Guided Black-Box Safety Testing of Deep Neural Networks , 2017, TACAS.

[5]  Wei Li,et al.  DeepBillboard: Systematic Physical-World Testing of Autonomous Driving Systems , 2018, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[6]  Gordon Fraser,et al.  Automatically testing self-driving cars with search-based procedural content generation , 2019, ISSTA.

[7]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[8]  Liqun Sun,et al.  Metamorphic testing of driverless cars , 2019, Commun. ACM.

[9]  Shin Yoo,et al.  Guiding Deep Learning System Testing Using Surprise Adequacy , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[10]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[11]  R. Ulusay,et al.  Object Constraint Language Specification , 1997 .

[12]  Yadong Mu,et al.  Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues , 2017, ArXiv.

[13]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Mark Harman,et al.  Machine Learning Testing: Survey, Landscapes and Horizons , 2019, IEEE Transactions on Software Engineering.

[15]  Alberto L. Sangiovanni-Vincentelli,et al.  Systematic Testing of Convolutional Neural Networks for Autonomous Driving , 2017, ArXiv.

[16]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Joelle Pineau Building reproducible, reusable, and robust machine learning software , 2020, DEBS.

[18]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Sarfraz Khurshid,et al.  DeepRoad: GAN-Based Metamorphic Testing and Input Validation Framework for Autonomous Driving Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[20]  Marsha Chechik,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 2016, Lecture Notes in Computer Science.

[21]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[22]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[23]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[24]  Rupak Majumdar,et al.  Paracosm: A Language and Tool for Testing Autonomous Driving Systems , 2019, ArXiv.

[25]  Lei Ma,et al.  DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[26]  Georgios Fainekos,et al.  Simulation-based Adversarial Test Generation for Autonomous Vehicles with Machine Learning Components , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).