Leveraging combinatorial testing for safety-critical computer vision datasets

Deep learning-based approaches have gained popularity for environment perception tasks such as semantic segmentation and object detection from images. However, the different nature of a data-driven deep neural nets (DNN) to conventional software is a challenge for practical software verification. In this work, we show how existing methods from software engineering provide benefits for the development of a DNN and in particular for dataset design and analysis. We show how combinatorial testing based on a domain model can be leveraged for generating test sets providing coverage guarantees with respect to important environmental features and their interaction. Additionally, we show how our approach can be used for growing a dataset, i.e. to identify where data is missing and should be collected next. We evaluate our approach on an internal use case and two public datasets.

[1]  Miroslav Bures,et al.  Constrained Interaction Testing: A Systematic Literature Study , 2017, IEEE Access.

[2]  Hermann Winner,et al.  Functional decomposition—A contribution to overcome the parameter space explosion during validation of highly automated driving , 2019, Traffic injury prevention.

[3]  D. Richard Kuhn,et al.  Software fault interactions and implications for software testing , 2004, IEEE Transactions on Software Engineering.

[4]  Matthias Woehrle,et al.  Open Questions in Testing of Learned Computer Vision Functions for Automated Driving , 2019, SAFECOMP Workshops.

[5]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Oliver Zendel,et al.  WildDash - Creating Hazard-Aware Benchmarks , 2018, ECCV.

[7]  Martin Herrmann,et al.  Flex Fuel Software Maintainability Improvement: A Case Study , 2016 .

[8]  Gustavo Carneiro,et al.  Hidden stratification causes clinically meaningful failures in machine learning for medical imaging , 2019, CHIL.

[9]  Mauro Pezzè,et al.  Software testing and analysis - process, principles and techniques , 2007 .

[10]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jacek Czerwonka,et al.  Pairwise Testing in Real World , 2006 .

[12]  Sebastian Sudholt,et al.  Safety Concerns and Mitigation Approaches Regarding the Use of Deep Learning in Safety-Critical Perception Tasks , 2020, SAFECOMP Workshops.

[13]  Yadong Wang,et al.  Combinatorial Testing for Deep Learning Systems , 2018, ArXiv.

[14]  Hareton K. N. Leung,et al.  A survey of combinatorial testing , 2011, CSUR.

[15]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.