Machine Learning Model Development from a Software Engineering Perspective: A Systematic Literature Review

Data scientists often develop machine learning models to solve a variety of problems in the industry and academy but not without facing several challenges in terms of Model Development. The problems regarding Machine Learning Development involves the fact that such professionals do not realize that they usually perform ad-hoc practices that could be improved by the adoption of activities presented in the Software Engineering Development Lifecycle. Of course, since machine learning systems are different from traditional Software systems, some differences in their respective development processes are to be expected. In this context, this paper is an effort to investigate the challenges and practices that emerge during the development of ML models from the software engineering perspective by focusing on understanding how software developers could benefit from applying or adapting the traditional software engineering process to the Machine Learning workflow.

[1]  Ming Li,et al.  Synergy between Machine/Deep Learning and Software Engineering: How Far Are We? , 2020, ArXiv.

[2]  Miryung Kim Software Engineering for Data Analytics , 2020, IEEE Software.

[3]  Mohamed Abdelrazek,et al.  Interpreting Cloud Computer Vision Pain-Points: A Mining Study of Stack Overflow , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[4]  Chen Wang,et al.  A Systematic Literature Review on Federated Machine Learning , 2020, ACM Comput. Surv..

[5]  Tim Verbelen,et al.  Software Engineering Practices for Machine Learning , 2019, ArXiv.

[6]  Foutse Khomh,et al.  Software Engineering for Machine-Learning Applications: The Road Ahead , 2018, IEEE Software.

[7]  Alois Knoll,et al.  Automated Trainability Evaluation for Smart Software Functions , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[8]  Anh Nguyen-Duc,et al.  Software engineering for artificial intelligence and machine learning software: A systematic literature review , 2020, ArXiv.

[9]  Atif Mashkoor,et al.  Machine Learning for Software Engineering: A Systematic Mapping , 2020, ArXiv.

[10]  Xiaotong Liu,et al.  Characterizing machine learning process: A maturity framework , 2018, BPM.

[11]  Volker Gruhn,et al.  Towards a Software Engineering Process for Developing Data-Driven Applications , 2019, 2019 IEEE/ACM 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE).

[12]  Bernhard Beckert,et al.  Towards Classes of Architectural Dependability Assurance for Machine-Learning-Based Systems , 2020, 2020 IEEE/ACM 15th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS).

[13]  Emily Denton,et al.  Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure , 2020, FAccT.

[14]  Lars Reimann,et al.  Achieving guidance in applied machine learning through software engineering techniques , 2020, Programming.

[15]  Volker Gruhn,et al.  Towards concept based software engineering for intelligent agents , 2019, RAISE@ICSE.

[16]  Nigel W. Hardy,et al.  A Scientific Knowledge Discovery and Data Mining Process Model for Metabolomics , 2020, IEEE Access.

[17]  Michael R. Lyu,et al.  An Empirical Study of Common Challenges in Developing Deep Learning Applications , 2019, 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE).

[18]  Drew Paine,et al.  Sensemaking Practices in the Everyday Work of AI/ML Software Engineering , 2020, ICSE.

[19]  Martin Hirzel,et al.  AIMMX: Artificial Intelligence Model Metadata Extractor , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[20]  Fuyuki Ishikawa,et al.  How Do Engineers Perceive Difficulties in Engineering of Machine-Learning Systems? - Questionnaire Survey , 2019, 2019 IEEE/ACM Joint 7th International Workshop on Conducting Empirical Studies in Industry (CESI) and 6th International Workshop on Software Engineering Research and Industrial Practice (SER&IP).

[21]  Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development , 2020, SBQS.

[22]  Markus Borg,et al.  Automotive Safety and Machine Learning: Initial Results from a Study on How to Adapt the ISO 26262 Safety Standard , 2018, 2018 IEEE/ACM 1st International Workshop on Software Engineering for AI in Autonomous Systems (SEFAIAS).

[23]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[24]  Foutse Khomh,et al.  Machine Learning Software Engineering in Practice: An Industrial Case Study , 2019, ArXiv.

[25]  Igor Steinmacher,et al.  Understanding Development Process of Machine Learning Systems: Challenges and Solutions , 2019, 2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM).

[26]  Koen van der Blom,et al.  Adoption and Effects of Software Engineering Best Practices in Machine Learning , 2020, ESEM.

[27]  Foutse Khomh,et al.  Studying Software Engineering Patterns for Designing Machine Learning Systems , 2019, 2019 10th International Workshop on Empirical Software Engineering in Practice (IWESEP).

[28]  Gail C. Murphy,et al.  How does Machine Learning Change Software Development Practices? , 2021, IEEE Transactions on Software Engineering.

[29]  Kushal Singla,et al.  Analysis of Software Engineering for Agile Machine Learning Projects , 2018, 2018 15th IEEE India Council International Conference (INDICON).

[30]  Michael Felderer,et al.  Risk-based data validation in machine learning-based software systems , 2019, MaLTeSQuE@ESEC/SIGSOFT FSE.

[31]  Andrew J. Simmons,et al.  A large-scale comparative analysis of Coding Standard conformance in Open-Source Data Science projects , 2020, ESEM.

[32]  Simos Gerasimou,et al.  Importance-Driven Deep Learning System Testing , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[33]  Matthias Book,et al.  Supporting Software Engineering Practices in the Development of Data-Intensive HPC Applications with the JuML Framework , 2017 .

[34]  Harald C. Gall,et al.  Software Engineering for Machine Learning: A Case Study , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).