Paracosm: A Language and Tool for Testing Autonomous Driving Systems

Systematic testing of autonomous vehicles operating in complex real-world scenarios is a difficult and expensive problem. We present Paracosm, a reactive language for writing test scenarios for autonomous driving systems. Paracosm allows users to programmatically describe complex driving situations with specific visual features, e.g., road layout in an urban environment, as well as reactive temporal behaviors of cars and pedestrians. Paracosm programs are executed on top of a game engine that provides realistic physics simulation and visual rendering. The infrastructure allows systematic exploration of the state space, both for visual features (lighting, shadows, fog) and for reactive interactions with the environment (pedestrians, other traffic). We define a notion of test coverage for Paracosm configurations based on combinatorial testing and low dispersion sequences. Paracosm comes with an automatic test case generator that uses random sampling for discrete parameters and deterministic quasi-Monte Carlo generation for continuous parameters. Through an empirical evaluation, we demonstrate the modeling and testing capabilities of Paracosm on a suite of autonomous driving systems implemented using deep neural networks developed in research and education. We show how Paracosm can expose incorrect behaviors or degraded performance.

[1]  Yu Lei,et al.  SP 800-142. Practical Combinatorial Testing , 2010 .

[2]  Alexandre Donzé,et al.  Breach, A Toolbox for Verification and Parameter Synthesis of Hybrid Systems , 2010, CAV.

[3]  Ashish Kapoor,et al.  AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles , 2017, FSR.

[4]  Luis Salgado,et al.  Video analysis-based vehicle detection and tracking using an MCMC sampling framework , 2012, EURASIP J. Adv. Signal Process..

[5]  E. Tronci,et al.  1996 , 1997, Affair of the Heart.

[6]  Georgios E. Fainekos Automotive control design bug-finding with the S-TaLiRo tool , 2015, 2015 American Control Conference (ACC).

[7]  Swarat Chaudhuri,et al.  AI2: Safety and Robustness Certification of Neural Networks with Abstract Interpretation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[8]  Xiaowei Huang,et al.  Reachability Analysis of Deep Neural Networks with Provable Guarantees , 2018, IJCAI.

[9]  Xin Zhang,et al.  End to End Learning for Self-Driving Cars , 2016, ArXiv.

[10]  Dawn Xiaodong Song,et al.  Exploring the Space of Black-box Attacks on Deep Neural Networks , 2017, ArXiv.

[11]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[12]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[13]  Sriram Sankaranarayanan,et al.  Falsification of temporal properties of hybrid systems using the cross-entropy method , 2012, HSCC '12.

[14]  Georgios Fainekos,et al.  Sim-ATAV: Simulation-Based Adversarial Testing Framework for Autonomous Vehicles , 2018, HSCC.

[15]  Rupak Majumdar,et al.  Testing Cyber-Physical Systems through Bayesian Optimization , 2017, ACM Trans. Embed. Comput. Syst..

[16]  Nicolas Halbwachs,et al.  LUSTRE: A declarative language for programming synchronous systems* , 1987 .

[17]  Sanjit A. Seshia,et al.  Compositional Falsification of Cyber-Physical Systems with Machine Learning Components , 2017, NFM.

[18]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[19]  Todd D. Millstein,et al.  Declarative mocking , 2013, ISSTA.

[20]  Garvit Juniwal,et al.  Robust online monitoring of signal temporal logic , 2017, Formal Methods Syst. Des..

[21]  James Kapinski,et al.  Stochastic Local Search for Falsification of Hybrid Systems , 2015, ATVA.

[22]  Joël Ouaknine,et al.  Online Monitoring of Metric Temporal Logic , 2014, RV.

[23]  Thomas A. Henzinger,et al.  Reactive Modules , 1999, Formal Methods Syst. Des..

[24]  Paul Hudak,et al.  Functional reactive programming from first principles , 2000, PLDI '00.

[25]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kent L. Beck,et al.  Test-driven Development - by example , 2002, The Addison-Wesley signature series.

[27]  Jesse Liberty,et al.  Programming Reactive Extensions and LINQ , 2011 .

[28]  G. Rote,et al.  Quasi-Monte-Carlo methods and the dispersion of point sequences , 1996 .

[29]  Matthew Wicker,et al.  Feature-Guided Black-Box Safety Testing of Deep Neural Networks , 2017, TACAS.

[30]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[31]  Paul Hudak,et al.  Arrows, Robots, and Functional Reactive Programming , 2002, Advanced Functional Programming.

[32]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[33]  Bernd Finkbeiner,et al.  Vehicle Platooning Simulations with Functional Reactive Programming , 2017, SCAV@CPSWeek.

[34]  Ananthram Swami,et al.  Practical Black-Box Attacks against Machine Learning , 2016, AsiaCCS.

[35]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[36]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[37]  S. M. García,et al.  2014: , 2020, A Party for Lazarus.

[38]  Harald Niederreiter,et al.  Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[39]  Sriram Sankaranarayanan,et al.  S-TaLiRo: A Tool for Temporal Logic Falsification for Hybrid Systems , 2011, TACAS.

[40]  Jannik Fritsch,et al.  A new performance measure and evaluation benchmark for road detection algorithms , 2013, 16th International IEEE Conference on Intelligent Transportation Systems (ITSC 2013).

[41]  Steve Freeman,et al.  Endo-testing: unit testing with mock objects , 2001 .

[42]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[43]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[44]  Matthew Mirman,et al.  Differentiable Abstract Interpretation for Provably Robust Neural Networks , 2018, ICML.

[45]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[46]  Rupak Majumdar,et al.  Why is random testing effective for partition tolerance bugs? , 2017, Proc. ACM Program. Lang..

[47]  C. Colbourn Combinatorial aspects of covering arrays , 2006 .

[48]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).