Towards Guidelines for Assessing Qualities of Machine Learning Systems

Nowadays, systems containing components based on machine learning (ML) methods are becoming more widespread. In order to ensure the intended behavior of a software system, there are standards that define necessary quality aspects of the system and its components (such as ISO/IEC 25010). Due to the different nature of ML, we have to adjust quality aspects or add additional ones (such as trustworthiness) and be very precise about which aspect is really relevant for which object of interest (such as completeness of training data), and how to objectively assess adherence to quality requirements. In this article, we present the construction of a quality model (i.e., evaluation objects, quality aspects, and metrics) for an ML system based on an industrial use case. This quality model enables practitioners to specify and assess quality requirements for such kinds of ML systems objectively. In the future, we want to learn how the term quality differs between different types of ML systems and come up with general guidelines for specifying and assessing qualities of ML systems.

[1]  Gail C. Murphy,et al.  How does Machine Learning Change Software Development Practices? , 2021, IEEE Transactions on Software Engineering.

[2]  Shin Nakajima,et al.  [Invited] Quality Assurance of Machine Learning Software , 2018, 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE).

[3]  Thomas Fehlmann Testing Artificial Intelligence , 2019, EuroSPI.

[4]  Reinhold Plösch,et al.  Operationalised product quality models and assessment: The Quamoco approach , 2014, Inf. Softw. Technol..

[5]  Mark Harman,et al.  Machine Learning Testing: Survey, Landscapes and Horizons , 2019, IEEE Transactions on Software Engineering.

[6]  KaufmanShachar,et al.  Leakage in data mining , 2012 .

[7]  Michael Herrmann,et al.  From Principles to Practice : An interdisciplinary framework to operationalise AI ethics , 2020 .

[8]  Harald C. Gall,et al.  Software Engineering for Machine Learning: A Case Study , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[9]  Nicolas Lachiche,et al.  CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories , 2021, IEEE Transactions on Knowledge and Data Engineering.

[10]  Ivica Crnkovic,et al.  A Taxonomy of Software Engineering Challenges for Machine Learning Systems: An Empirical Investigation , 2019, XP.

[11]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[12]  Mikio Aoyama,et al.  Requirements-Driven Method to Determine Quality Characteristics and Measurements for Machine Learning Software and Its Evaluation , 2020, 2020 IEEE 28th International Requirements Engineering Conference (RE).

[13]  Satoshi Masuda,et al.  Guidelines for Quality Assurance of Machine Learning-based Artificial Intelligence , 2020, SEKE.

[14]  M. N. Sulaiman,et al.  A Review On Evaluation Metrics For Data Classification Evaluations , 2015 .

[15]  Michael Kläs,et al.  Uncertainty in Machine Learning Applications: A Practice-Driven Classification of Uncertainty , 2018, SAFECOMP Workshops.

[16]  E. Valuations A REVIEW ON EVALUATION METRICS FOR DATA CLASSIFICATION EVALUATIONS , 2015 .

[17]  Solon Barocas,et al.  Engaging the ethics of data science in practice , 2017, Commun. ACM.

[18]  Gonzalo Mariscal,et al.  A survey of data mining and knowledge discovery process models and methodologies , 2010, The Knowledge Engineering Review.

[19]  Stephen G. Kobourov,et al.  Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale , 2016, PloS one.

[20]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.