How does Machine Learning Change Software Development Practices?

Adding an ability for a system to learn inherently adds non-determinism into the system. Given the rising popularity of incorporating machine learning into systems, we wondered how the addition alters software development practices. We performed a mixture of qualitative and quantitative studies with 14 interviewees and 342 survey respondents from 26 countries across four continents to elicit significant differences between the development of machine learning systems and the development of non-machine-learning systems. Our study uncovers significant differences in various aspects of software engineering (e.g., requirements, design, testing, and process) and work features (e.g., skill variety, problem solving and task identity). Based on our findings, we highlight future research directions and provide recommendations for practitioners.

[1]  Jeffrey Heer,et al.  Enterprise Data Analysis and Visualization: An Interview Study , 2012, IEEE Transactions on Visualization and Computer Graphics.

[2]  Pradeep K. Tyagi The effects of appeals, anonymity, and feedback on mail survey response patterns from salespeople , 1989 .

[3]  Stephen E. Humphrey,et al.  Integrating motivational, social, and contextual work design features: a meta-analytic summary and theoretical extension of the work design literature. , 2007, The Journal of applied psychology.

[4]  Lei Ma,et al.  DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  V. Braun,et al.  Using thematic analysis in psychology , 2006 .

[6]  A Straw,et al.  Guide to the Software Engineering Body of Knowledge , 1998 .

[7]  Xin Zhang,et al.  TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.

[8]  Wen-Chuan Lee,et al.  MODE: automated neural network model debugging via state differential analysis and input selection , 2018, ESEC/SIGSOFT FSE.

[9]  Steven Euijong Whang,et al.  A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective , 2018, IEEE Transactions on Knowledge and Data Engineering.

[10]  Laura Johnson,et al.  How Many Interviews Are Enough? , 2006 .

[11]  Miryung Kim,et al.  BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[12]  Lin Qiao,et al.  Large-scale empirical study on machine learning related questions on Stack Overflow , 2019 .

[13]  Shari Lawrence Pfleeger,et al.  Personal Opinion Surveys , 2008, Guide to Advanced Empirical Software Engineering.

[14]  Miryung Kim,et al.  The Emerging Role of Data Scientists on Software Development Teams , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[15]  Wen-Chuan Lee,et al.  LAMP: data provenance for graph based machine learning algorithms through derivative computation , 2017, ESEC/SIGSOFT FSE.

[16]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[17]  Miryung Kim,et al.  Data Scientists in Software Teams: State of the Art and Challenges , 2018, IEEE Transactions on Software Engineering.

[18]  Sébastien Marcel,et al.  Continuously Reproducing Toolchains in Pattern Recognition and Machine Learning Experiments , 2017, ICML 2017.

[19]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[20]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[21]  Harald C. Gall,et al.  Software Engineering for Machine Learning: A Case Study , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[22]  Thomas D. LaToza,et al.  Maintaining mental models: a study of developer work habits , 2006, ICSE.

[23]  Wentong Li,et al.  Estimating conversion rate in display advertising from past erformance data , 2012, KDD.

[24]  Sean Murphy,et al.  Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work , 2013 .

[25]  Jeffrey C. Carver,et al.  Software Development Environments for Scientific and Engineering Software: A Series of Case Studies , 2007, 29th International Conference on Software Engineering (ICSE'07).

[26]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[27]  Felix Bießmann,et al.  On Challenges in Machine Learning Model Management , 2018, IEEE Data Eng. Bull..

[28]  Baowen Xu,et al.  Testing and validating machine learning classifiers by metamorphic testing , 2011, J. Syst. Softw..

[29]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[30]  Mary Czerwinski,et al.  Interactions with big data analytics , 2012, INTR.