Questions for data scientists in software engineering: a replication

In 2014, a Microsoft study investigated the sort of questions that data science applied to software engineering should answer. This resulted in 145 questions that developers considered relevant for data scientists to answer, thus providing a research agenda to the community. Fast forward to five years, no further studies investigated whether the questions from the software engineers at Microsoft hold for other software companies, including software-intensive companies with different primary focus (to which we refer as software-defined enterprises). Furthermore, it is not evident that the problems identified five years ago are still applicable, given the technological advances in software engineering. This paper presents a study at ING, a software-defined enterprise in banking in which over 15,000 IT staff provides in-house software solutions. This paper presents a comprehensive guide of questions for data scientists selected from the previous study at Microsoft along with our current work at ING. We replicated the original Microsoft study at ING, looking for questions that impact both software companies and software-defined enterprises and continue to impact software engineering. We also add new questions that emerged from differences in the context of the two companies and the five years gap in between. Our results show that software engineering questions for data scientists in the software-defined enterprise are largely similar to the software company, albeit with exceptions. We hope that the software engineering research community builds on the new list of questions to create a useful body of knowledge.

[1]  Georgios Gousios,et al.  How (Much) Do Developers Test? , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[2]  Matthias Trapp,et al.  Thread City: Combined Visualization of Structure and Activity for the Exploration of Multi-threaded Software Systems , 2015, 2015 19th International Conference on Information Visualisation.

[3]  Bogdan Vasilescu,et al.  A conceptual replication of continuous integration pain points in the context of Travis CI , 2019, ESEC/SIGSOFT FSE.

[4]  M. Bond,et al.  Hofstede's Culture Dimensions , 1984 .

[5]  Gerardo Canfora,et al.  How Open Source Projects Use Static Code Analysis Tools in Continuous Integration Pipelines , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[6]  David Lo,et al.  Practitioners' expectations on automated fault localization , 2016, ISSTA.

[7]  Natalia Juristo Juzgado,et al.  Replication of Software Engineering Experiments , 2010, LASER Summer School.

[8]  Anna Perini,et al.  Finding and Analyzing App Reviews Related to Specific Features: A Research Preview , 2019, REFSQ.

[9]  Georgios Gousios,et al.  Developer Testing in the IDE: Patterns, Beliefs, and Behavior , 2019, IEEE Trans. Software Eng..

[10]  Andy Zaidman,et al.  Continuous Delivery Practices in a Large Financial Organization , 2016, ICSME.

[11]  Baowen Xu,et al.  How Practitioners Perceive Automated Bug Report Management Techniques , 2020, IEEE Transactions on Software Engineering.

[12]  Tim Menzies,et al.  Easy over hard: a case study on deep learning , 2017, ESEC/SIGSOFT FSE.

[13]  ANDRÉ BACARD Anonymous authors , 1989, Nature.

[14]  Foutse Khomh,et al.  Software Engineering for Machine-Learning Applications: The Road Ahead , 2018, IEEE Software.

[15]  Vibhu Saujanya Sharma,et al.  What Do Developers Want? An Advisor Approach for Developer Priorities , 2017, 2017 IEEE/ACM 10th International Workshop on Cooperative and Human Aspects of Software Engineering (CHASE).

[16]  Xiaodong Gu,et al.  "What Parts of Your Apps are Loved by Users?" (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[17]  Tommi Mikkonen,et al.  Designing an Unobtrusive Analytics Framework for Monitoring Java Applications , 2015, IWSM/Mensura.

[18]  Maleknaz Nayebi,et al.  Predicting the Vector Impact of Change - An Industrial Case Study at Brightsquid , 2017, 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[19]  Lorin Hochstein,et al.  Automating Chaos Experiments in Production , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[20]  Claes Wohlin,et al.  Guidelines for snowballing in systematic literature studies and a replication in software engineering , 2014, EASE '14.

[21]  Miryung Kim,et al.  The Emerging Role of Data Scientists on Software Development Teams , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[22]  Maleknaz Nayebi,et al.  Crowdsourced Exploration of Mobile App Features: A Case Study of the Fort McMurray Wildfire , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Society Track (ICSE-SEIS).

[23]  Timothy Arndt Big Data and software engineering: prospects for mutual enrichment , 2018 .

[24]  N. Kano,et al.  Attractive Quality and Must-Be Quality , 1984 .

[25]  Cuauhtémoc López Martín,et al.  A machine learning technique for predicting the productivity of practitioners from individually developed software projects , 2014, 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[26]  Tobias Roehm Two User Perspectives in Program Comprehension: End Users and Developer Users , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[27]  James Miller,et al.  Replicating software engineering experiments: a poisoned chalice or the Holy Grail , 2005, Inf. Softw. Technol..

[28]  Ashish Sureka,et al.  Identifying Software Process Management Challenges: Survey of Practitioners in a Large Global IT Company , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[29]  Di Chen,et al.  Applications of psychological science for actionable analytics , 2018, ESEC/SIGSOFT FSE.

[30]  Nicholas Nelson,et al.  TDDViz: Using Software Changes to Understand Conformance to Test Driven Development , 2016, XP.

[31]  Maleknaz Nayebi,et al.  A Longitudinal Study of Identifying and Paying Down Architecture Debt , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[32]  Premkumar T. Devanbu,et al.  Belief & Evidence in Empirical Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[33]  Jürgen Cito,et al.  Modelling and Managing Deployment Costs of Microservice-Based Cloud Applications , 2016, 2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC).

[34]  Andreas Zeller,et al.  Anatomy of Functionality Deletion: An Exploratory Study on Mobile Apps , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[35]  Tim Menzies,et al.  Software Analytics: What’s Next? , 2018, IEEE Software.

[36]  Alberto Bacchelli,et al.  On the Reaction to Deprecation of 25, 357 Clients of 4+1 Popular Java APIs , 2016, ICSME.

[37]  Niklas Elmqvist,et al.  The Interactive Visualization Gap in Initial Exploratory Data Analysis , 2018, IEEE Transactions on Visualization and Computer Graphics.

[38]  Jeffrey C. Carver,et al.  The role of replications in Empirical Software Engineering , 2008, Empirical Software Engineering.

[39]  Vahid Garousi,et al.  A Survey of Software Engineering Practices in Turkey (extended version) , 2014, J. Syst. Softw..

[40]  Ashish Sureka,et al.  University-industry collaboration and open source software (OSS) dataset in mining software repositories (MSR) research , 2015, 2015 IEEE 1st International Workshop on Software Analytics (SWAN).

[41]  Georgios Gousios,et al.  When, how, and why developers (do not) test in their IDEs , 2015, ESEC/SIGSOFT FSE.

[42]  Goran Nenadic,et al.  Extracting useful software development information from mobile application reviews: A survey of intelligent mining techniques and tools , 2018, Expert Syst. Appl..

[43]  Harald C. Gall,et al.  Software Engineering for Machine Learning: A Case Study , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[44]  Matthew Chalmers,et al.  Probabilistic Formal Analysis of App Usage to Inform Redesign , 2015, IFM.

[45]  Barbara A. Kitchenham,et al.  The role of replications in empirical software engineering—a word of warning , 2008, Empirical Software Engineering.

[46]  Burak Turhan,et al.  Sharing Data and Models in Software Engineering , 2014 .

[47]  Vahid Garousi,et al.  Selecting the Right Topics for Industry-Academia Collaborations in Software Testing: An Experience Report , 2016, 2016 IEEE International Conference on Software Testing, Verification and Validation (ICST).

[48]  David Lo,et al.  Perceptions, Expectations, and Challenges in Defect Prediction , 2020, IEEE Transactions on Software Engineering.

[49]  Hajimu Iida,et al.  "Was My Contribution Fairly Reviewed?" A Framework to Study the Perception of Fairness in Modern Code Reviews , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[50]  Shari Lawrence Pfleeger,et al.  Personal Opinion Surveys , 2008, Guide to Advanced Empirical Software Engineering.

[51]  Leif Singer,et al.  How Social and Communication Channels Shape and Challenge a Participatory Culture in Software Development , 2017, IEEE Transactions on Software Engineering.

[52]  Anil Kumar Tripathi,et al.  Safety Analysis of Safety-Critical Systems Using State-Space Models , 2017, IEEE Software.

[53]  Bernd Bruegge,et al.  Ensemble Methods for App Review Classification: An Approach for Software Evolution (N) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[54]  Burak Turhan,et al.  Sharing Data: Challenges and Methods , 2015, MoDELS 2015.

[55]  Natalia Juristo Juzgado,et al.  Understanding replication of experiments in software engineering: A classification , 2014, Inf. Softw. Technol..

[56]  Philip J. Guo,et al.  Paradise unplugged: identifying barriers for female participation on stack overflow , 2016, SIGSOFT FSE.

[57]  Vahid Garousi,et al.  Worlds Apart: Industrial and Academic Focus Areas in Software Testing , 2017, IEEE Software.

[58]  Chris Parnin,et al.  Characterizing and predicting mental fatigue during programming tasks , 2017 .

[59]  Tommi Mikkonen,et al.  Post-Deployment Data: A Recipe for Satisfying Knowledge Needs in Software Development? , 2016, 2016 Joint Conference of the International Workshop on Software Measurement and the International Conference on Software Process and Product Measurement (IWSM-MENSURA).

[60]  Michael W. Godfrey,et al.  Code Review Quality: How Developers See It , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[61]  Georgios Gousios,et al.  Streaming Software Analytics , 2016, 2016 IEEE/ACM 2nd International Workshop on Big Data Software Engineering (BIGDSE).

[62]  Xiao Wang,et al.  How Robust Is Your Development Team? , 2017, IEEE Software.

[63]  Usa Sammapun,et al.  Analyzing user reviews in Thai language toward aspects in mobile applications , 2017, 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE).

[64]  Ivica Crnkovic,et al.  A Taxonomy of Software Engineering Challenges for Machine Learning Systems: An Empirical Investigation , 2019, XP.

[65]  Andreas Zeller,et al.  Detecting information flow by mutating input data , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[66]  Tim Menzies,et al.  Past, Present, and Future of Analyzing Software Data , 2015, The Art and Science of Analyzing Software Data.

[67]  Jeffrey C. Carver,et al.  How Practitioners Perceive the Relevance of ESEM Research , 2016, ESEM.

[68]  Brett A. Becker,et al.  Research This! Questions that Computing Educators Most Want Computing Education Researchers to Answer , 2019, ICER.

[69]  Paul Luo Li,et al.  What Makes a Great Software Engineer? , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[70]  Andrew Begel,et al.  Analyze this! 145 questions for data scientists in software engineering , 2013, ICSE.

[71]  Harald C. Gall,et al.  Bifrost: Supporting Continuous Deployment with Automated Enactment of Multi-Phase Live Testing Strategies , 2016, Middleware.

[72]  Tao Xie,et al.  Software intelligence: the future of mining software engineering data , 2010, FoSER '10.

[73]  Sérgio Soares,et al.  The Role of Rapid Reviews in Supporting Decision-Making in Software Engineering Practice , 2018, EASE.

[74]  David Lo,et al.  How practitioners perceive the relevance of software engineering research , 2015, ESEC/SIGSOFT FSE.

[75]  Natalia Juristo Juzgado,et al.  Topic selection in industry experiments , 2014, CESI 2014.

[76]  Alberto Bacchelli,et al.  On the reaction to deprecation of clients of 4 + 1 popular Java APIs and the JDK , 2018, Empirical Software Engineering.

[77]  Tim Menzies,et al.  Bellwethers: A Baseline Method for Transfer Learning , 2017, IEEE Transactions on Software Engineering.

[78]  Verena Käfer Summarizing software engineering communication artifacts from different sources , 2017, ESEC/SIGSOFT FSE.

[79]  Thomas Zimmermann Software productivity decoded: how data science helps to achieve more (keynote) , 2017, ICSSP.

[80]  Chris Parnin,et al.  Exploring Causes of Frustration for Software Developers , 2015, 2015 IEEE/ACM 8th International Workshop on Cooperative and Human Aspects of Software Engineering.