Understanding the Nature of System-Related Issues in Machine Learning Frameworks: An Exploratory Study

Modern systems are built using development frameworks. These frameworks have a major impact on how the resulting system executes, how configurations are managed, how it is tested, and how and where it is deployed. Machine learning (ML) frameworks and the systems developed using them differ greatly from traditional frameworks. Naturally, the issues that manifest in such frameworks may differ as well---as may the behavior of developers addressing those issues. We are interested in characterizing the system-related issues---issues impacting performance, memory and resource usage, and other quality attributes---that emerge in ML frameworks, and how they differ from those in traditional frameworks. We have conducted a moderate-scale exploratory study analyzing real-world system-related issues from 10 popular machine learning frameworks. Our findings offer implications for the development of machine learning systems, including differences in the frequency of occurrence of certain issue types, observations regarding the impact of debate and time on issue correction, and differences in the specialization of developers. We hope that this exploratory study will enable developers to improve their expectations, plan for risk, and allocate resources accordingly when making use of the tools provided by these frameworks to develop ML-based systems.

[1]  Marco Tulio Valente,et al.  Predicting the Popularity of GitHub Repositories , 2016, PROMISE.

[2]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[3]  Philipp Leitner,et al.  Patterns in the Chaos—A Study of Performance Variation and Predictability in Public IaaS Clouds , 2014, ACM Trans. Internet Techn..

[4]  W. B. Roberts,et al.  Machine Learning: The High Interest Credit Card of Technical Debt , 2014 .

[5]  Lin Tan,et al.  CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[6]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[7]  Tingting Yu,et al.  PerfLearner: Learning from Bug Reports to Understand and Generate Performance Test Frames , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[8]  Mark Harman,et al.  Machine Learning Testing: Survey, Landscapes and Horizons , 2019, IEEE Transactions on Software Engineering.

[9]  Yuanyuan Zhou,et al.  Early Detection of Configuration Errors to Reduce Failure Damage , 2016, USENIX Annual Technical Conference.

[10]  Harald C. Gall,et al.  Software Engineering for Machine Learning: A Case Study , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[11]  Jinqiu Yang,et al.  A Study of Oracle Approximations in Testing Deep Learning Libraries , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[12]  Xiaoxing Ma,et al.  Manifesting Bugs in Machine Learning Code: An Explorative Study with Mutation Testing , 2018, 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[13]  Long Jin,et al.  Hey, you have given me too many knobs!: understanding and dealing with over-designed configuration in system software , 2015, ESEC/SIGSOFT FSE.

[14]  Chris Murphy,et al.  An Approach to Software Testing of Machine Learning Applications , 2007, SEKE.

[15]  Yepang Liu,et al.  Characterizing and detecting performance bugs for smartphone applications , 2014, ICSE.

[16]  Alvin Cheung,et al.  Understanding Database Performance Inefficiencies in Real-world Web Applications , 2017, CIKM.

[17]  Brendan Gregg,et al.  Systems Performance: Enterprise and the Cloud , 2013 .

[18]  Yifan Chen,et al.  An empirical study on TensorFlow program bugs , 2018, ISSTA.

[19]  David Lo,et al.  An Empirical Study of Bugs in Machine Learning Systems , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[20]  Ranjit Jhala,et al.  Finding latent performance bugs in systems implementations , 2010, FSE '10.

[21]  Cristina L. Abad,et al.  Quantifying Cloud Performance and Dependability , 2018, ACM Trans. Model. Perform. Evaluation Comput. Syst..

[22]  R. P. Jagadeesh Chandra Bose,et al.  Identifying implementation bugs in machine learning based image classifiers using metamorphic testing , 2018, ISSTA.

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Rajeev Kumar,et al.  Sample size calculation , 2012, Indian journal of ophthalmology.

[25]  Xiao Ma,et al.  An empirical study on configuration errors in commercial and open source systems , 2011, SOSP.

[26]  Jie Cheng,et al.  CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..

[27]  D. Sculley,et al.  The ML test score: A rubric for ML production readiness and technical debt reduction , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[28]  Iago Abal,et al.  42 variability bugs in the linux kernel: a qualitative analysis , 2014, ASE.

[29]  Junfeng Yang,et al.  DeepXplore: Automated Whitebox Testing of Deep Learning Systems , 2017, SOSP.

[30]  Tao Xie,et al.  Multiple-Implementation Testing of Supervised Learning Software , 2016, AAAI Workshops.

[31]  Alvin Cheung,et al.  How not to Structure Your Database-Backed Web Applications: A Study of Performance Bugs in the Wild , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[32]  Michael Carbin,et al.  Overparameterization: A Connection Between Software 1.0 and Software 2.0 , 2019, SNAPL.

[33]  McConnellSteve,et al.  The Art, Science, and Engineering of Software Development , 1998 .

[34]  Liudmila Ulanova,et al.  An Empirical Analysis of Bug Reports and Bug Fixing in Open Source Android Apps , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[35]  Shu Wang,et al.  Understanding and Auto-Adjusting Performance-Sensitive Configurations , 2018, ASPLOS.

[36]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[37]  Tanakorn Leesatapornwongsa,et al.  What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems , 2014, SoCC.

[38]  Paul Ralph,et al.  Grounded Theory in Software Engineering Research: A Critical Review and Guidelines , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[39]  Ahmed E. Hassan,et al.  A qualitative study on performance bugs , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[40]  Tianyin Xu,et al.  How Do System Administrators Resolve Access-Denied Issues in the Real World? , 2017, CHI.

[41]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[42]  Baowen Xu,et al.  Testing and validating machine learning classifiers by metamorphic testing , 2011, J. Syst. Softw..