Taxonomy of Real Faults in Deep Learning Systems

The growing application of deep neural networks in safety-critical domains makes the analysis of faults that occur in such systems of enormous importance. In this paper we introduce a large taxonomy of faults in deep learning (DL) systems. We have manually analysed 1059 artefacts gathered from GitHub commits and issues of projects that use the most popular DL frameworks (TensorFlow, Keras and PyTorch) and from related Stack Overflow posts. Structured interviews with 20 researchers and practitioners describing the problems they have encountered in their experience have enriched our taxonomy with a variety of additional faults that did not emerge from the other two sources. Our final taxonomy was validated with a survey involving an additional set of 21 developers, confirming that almost all fault categories (13/15) were experienced by at least 50% of the survey participants.

[1]  Bente Anda,et al.  Experiences from conducting semi-structured interviews in empirical software engineering research , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[2]  David Lo,et al.  An Empirical Study of Bugs in Machine Learning Systems , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[3]  Ivica Crnkovic,et al.  A Taxonomy of Software Engineering Challenges for Machine Learning Systems: An Empirical Investigation , 2019, XP.

[4]  Lei Ma,et al.  DeepMutation: Mutation Testing of Deep Learning Systems , 2018, 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE).

[5]  Carolyn B. Seaman,et al.  Qualitative Methods in Empirical Studies of Software Engineering , 1999, IEEE Trans. Software Eng..

[6]  Jun Wan,et al.  MuNN: Mutation Analysis of Neural Networks , 2018, 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C).

[7]  Forrest Shull,et al.  Defect categorization: making use of a decade of widely varying historical data , 2008, ESEM '08.

[8]  Yifan Chen,et al.  An empirical study on TensorFlow program bugs , 2018, ISSTA.

[9]  Hridesh Rajan,et al.  A comprehensive study on deep learning bug characteristics , 2019, ESEC/SIGSOFT FSE.

[10]  Jan Bosch,et al.  Software Engineering Challenges of Deep Learning , 2018, 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA).

[11]  Birger Hjørland,et al.  Organizing Knowledge. An Introduction to Managing Access to Information , 2009, J. Documentation.

[12]  Lionel C. Briand,et al.  Is mutation an appropriate tool for testing experiments? , 2005, ICSE.

[13]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[14]  Bin Li,et al.  An Empirical Study on Real Bugs for Machine Learning Programs , 2017, 2017 24th Asia-Pacific Software Engineering Conference (APSEC).

[15]  Michael D. Ernst,et al.  Are mutants a valid substitute for real faults in software testing? , 2014, SIGSOFT FSE.

[16]  Bastin Tony Roy Savarimuthu,et al.  Crowdsourced Knowledge on Stack Overflow: A Systematic Mapping Study , 2017, EASE.

[17]  Emilia Mendes,et al.  Taxonomies in software engineering: A Systematic mapping study and a revised taxonomy development method , 2017, Inf. Softw. Technol..

[18]  Pascale Thévenod-Fosse,et al.  Software error analysis: a real case study involving real faults and mutations , 1996, ISSTA '96.

[19]  Boris Beizer,et al.  Software System Testing and Quality Assurance , 1984 .

[20]  MendesEmilia,et al.  Taxonomies in software engineering , 2017 .