论文信息 - Technology Readiness Levels for Machine Learning Systems

Technology Readiness Levels for Machine Learning Systems

The development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. The lack of diligence can lead to technical debt, scope creep and misaligned objectives, model misuse and failures, and expensive consequences. Engineering systems, on the other hand, follow well-defined processes and testing standards to streamline development for high-quality, reliable results. The extreme is spacecraft systems, where mission critical measures and robustness are ingrained in the development process. Drawing on experience in both spacecraft engineering and AI/ML (from research through product), we propose a proven systems engineering approach for machine learning development and deployment. Our Technology Readiness Levels for ML (TRL4ML) framework defines a principled process to ensure robust systems while being streamlined for ML research and product, including key distinctions from traditional software engineering. Even more, TRL4ML defines a common language for people across the organization to work collaboratively on ML technologies.

Alexander Lavin | Gregory Renard

[1] Zoubin Ghahramani,et al. Probabilistic machine learning and artificial intelligence , 2015, Nature.

[2] David Leslie,et al. Understanding artificial intelligence ethics and safety , 2019, ArXiv.

[3] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4] Roberto Cipolla,et al. Concrete Problems for Autonomous Vehicle Safety: Advantages of Bayesian Deep Learning , 2017, IJCAI.

[5] Nando de Freitas,et al. Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[6] Amy P. Abernethy,et al. Harnessing the Power of Real‐World Evidence (RWE): A Checklist to Ensure Regulatory‐Grade Data Quality , 2017, Clinical pharmacology and therapeutics.

[7] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.

[8] Alexander Lavin,et al. Manifolds for Unsupervised Visual Anomaly Detection , 2020, ArXiv.

[9] Peter Jenniskens,et al. CAMS: Cameras for Allsky Meteor Surveillance to establish minor meteor showers , 2011 .

[10] Neil D. Lawrence,et al. Challenges in Deploying Machine Learning: A Survey of Case Studies , 2020, ACM Comput. Surv..

[11] N. Moskovitz,et al. A survey of southern hemisphere meteor showers , 2018 .

[12] nasa,et al. NASA Systems Engineering Handbook , 2007 .

[13] Alexander D'Amour,et al. Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[14] Leonard E. Miller,et al. NASA systems engineering handbook , 1995 .

[15] D. Sculley,et al. The ML test score: A rubric for ML production readiness and technical debt reduction , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[16] Brian W. Powers,et al. Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[17] Amit Sharma,et al. Split-Treatment Analysis to Rank Heterogeneous Causal Effects for Prospective Interventions , 2020, WSDM.

[18] D. Sculley,et al. What’s your ML test score? A rubric for ML production systems , 2016 .

[19] Prabhat,et al. Etalumis: bringing probabilistic programming to scientific simulators at scale , 2019, SC.

[20] J. Robins,et al. Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[21] Ciarán M Lee,et al. Improving the accuracy of medical diagnosis with causal machine learning , 2020, Nature Communications.

[22] Hongseok Yang,et al. An Introduction to Probabilistic Programming , 2018, ArXiv.

[23] Gary S Collins,et al. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI Extension , 2020, Nature Medicine.

[24] Anit Kumar Sahu,et al. Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.

[25] Ankur Taly,et al. Explainable machine learning in deployment , 2019, FAT*.

[26] Alexander Lavin,et al. Technology Readiness Levels for AI & ML , 2020 .

[27] Inioluwa Deborah Raji,et al. Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing , 2020, FAT*.

[28] Alexei Botchkarev,et al. A New Typology Design of Performance Metrics to Measure Errors in Machine Learning Regression Algorithms , 2019, Interdisciplinary Journal of Information, Knowledge, and Management.

[29] Daniel Rueckert,et al. A generic framework for privacy preserving deep learning , 2018, ArXiv.

[30] Harald C. Gall,et al. Software Engineering for Machine Learning: A Case Study , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[31] Josh Veitch-Michaelis,et al. Learnings from Frontier Development Lab and SpaceML - AI Accelerators for NASA and ESA , 2020, ArXiv.

[32] Neoklis Polyzotis,et al. Data Validation for Machine Learning , 2019, SysML.

[33] J. von Neumann,et al. Probabilistic Logic and the Synthesis of Reliable Organisms from Unreliable Components , 1956 .

[34] Daniel L. Rubin,et al. Regulatory Frameworks for Development and Evaluation of Artificial Intelligence–Based Diagnostic Imaging Algorithms: Summary and Recommendations , 2020, Journal of the American College of Radiology.

[35] Nijs Jan Duijm,et al. Recommendations on the use and design of risk matrices , 2015 .

[36] Stefan Hinterstoißer,et al. An Annotation Saved is an Annotation Earned: Using Fully Synthetic Training for Object Instance Detection , 2019, ArXiv.

[37] Inioluwa Deborah Raji,et al. Model Cards for Model Reporting , 2018, FAT.

[38] Gilles Louppe,et al. The frontier of simulation-based inference , 2020, Proceedings of the National Academy of Sciences.

[39] Victor Veitch,et al. Sense and Sensitivity Analysis: Simple Post-Hoc Analysis of Bias Due to Unobserved Confounding , 2020, NeurIPS.

[40] Searching for Long-Period Comets with Deep Learning Tools , 2017 .

[41] F. Siegert,et al. Event generation with SHERPA 1.1 , 2008, 0811.4622.

[42] Kendra Albert,et al. Failure Modes in Machine Learning Systems , 2019, ArXiv.