Towards a More Reliable Interpretation of Machine Learning Outputs for Safety-Critical Systems using Feature Importance Fusion

When machine learning supports decision-making in safety-critical systems, it is important to verify and understand the reasons why a particular output is produced. Although feature importance calculation approaches assist in interpretation, there is a lack of consensus regarding how features' importance is quantified, which makes the explanations offered for the outcomes mostly unreliable. A possible solution to address the lack of agreement is to combine the results from multiple feature importance quantifiers to reduce the variance of estimates. Our hypothesis is that this will lead to more robust and trustworthy interpretations of the contribution of each feature to machine learning predictions. To assist test this hypothesis, we propose an extensible Framework divided in four main parts: (i) traditional data pre-processing and preparation for predictive machine learning models; (ii) predictive machine learning; (iii) feature importance quantification and (iv) feature importance decision fusion using an ensemble strategy. We also introduce a novel fusion metric and compare it to the state-of-the-art. Our approach is tested on synthetic data, where the ground truth is known. We compare different fusion approaches and their results for both training and test sets. We also investigate how different characteristics within the datasets affect the feature importance ensembles studied. Results show that our feature importance ensemble Framework overall produces 15% less feature importance error compared to existing methods. Additionally, results reveal that different levels of noise in the datasets do not affect the feature importance ensembles' ability to accurately quantify feature importance, whereas the feature importance quantification error increases with the number of features and number of orthogonal informative features.

[1]  Peter Henderson,et al.  Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims , 2020, ArXiv.

[2]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[3]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Lee Lacy,et al.  Defense Advanced Research Projects Agency (DARPA) Agent Markup Language Computer Aided Knowledge Acquisition , 2005 .

[5]  Koen W. De Bock,et al.  Reconciling performance and interpretability in customer churn prediction using ensemble learning based on generalized additive models , 2012, Expert Syst. Appl..

[6]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  Dazhong Wu,et al.  Deep learning for smart manufacturing: Methods and applications , 2018, Journal of Manufacturing Systems.

[9]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[10]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Grazziela Patrocinio Figueredo,et al.  Deep Learning Approaches to Aircraft Maintenance, Repair and Overhaul: A Review , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[13]  Hugh Chen,et al.  From local explanations to global understanding with explainable AI for trees , 2020, Nature Machine Intelligence.

[14]  Claudio Savaglio,et al.  A body sensor data fusion and deep recurrent neural network-based behavior recognition approach for robust healthcare , 2020, Inf. Fusion.

[15]  Lin Song,et al.  Random generalized linear model: a highly accurate and interpretable ensemble predictor , 2013, BMC Bioinformatics.

[16]  Jane E. Huggins,et al.  Asilomar survey: researcher perspectives on ethical principles and guidelines for BCI research , 2018, Brain-Computer Interfaces.

[17]  L. Shapley A Value for n-person Games , 1988 .

[18]  Yung C. Shin,et al.  In-Process monitoring of porosity during laser additive manufacturing process , 2019, Additive Manufacturing.

[19]  Piet Demeester,et al.  NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms , 2014, PloS one.

[20]  Anant Madabhushi,et al.  Accurate and reproducible invasive breast cancer detection in whole-slide images: A Deep Learning approach for quantifying tumor extent , 2017, Scientific Reports.

[21]  Tommi S. Jaakkola,et al.  On the Robustness of Interpretability Methods , 2018, ArXiv.

[22]  Bhekisipho Twala Impact of noise on credit risk prediction: Does data quality really matter? , 2013, Intell. Data Anal..

[23]  J. Sola,et al.  Importance of input data normalization for the application of neural networks to complex industrial problems , 1997 .

[24]  Nikolaos Avouris,et al.  Machine Learning algorithms : a study on noise sensitivity , 2003 .

[25]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  Direnc Pekaslan,et al.  Capturing Uncertainty in Heavy Goods Vehicles Driving Behaviour , 2020, 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC).

[28]  Binxu Zhai,et al.  Development of a stacked ensemble model for forecasting and analyzing daily average PM2.5 concentrations in Beijing, China. , 2018, The Science of the total environment.

[29]  Daniel Cremers,et al.  Regularization for Deep Learning: A Taxonomy , 2017, ArXiv.

[30]  Grazziela Patrocinio Figueredo,et al.  Benchmarking Deep Learning Models for Driver Distraction Detection , 2020, LOD.

[31]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[32]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[33]  M. Kendall Rank Correlation Methods , 1949 .

[34]  Fei Wang,et al.  Deep learning for healthcare: review, opportunities and challenges , 2018, Briefings Bioinform..

[35]  P. Alam ‘N’ , 2021, Composites Engineering: An A–Z Guide.

[36]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[37]  Thomas Lengauer,et al.  Permutation importance: a corrected feature importance measure , 2010, Bioinform..

[38]  Charles R. Farrar,et al.  Structural Health Monitoring: A Machine Learning Perspective , 2012 .

[39]  Amina Adadi,et al.  Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.

[40]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[41]  Francisco Herrera,et al.  Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI , 2020, Inf. Fusion.

[42]  Alun D. Preece,et al.  Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[43]  F. Necati Catbas,et al.  A machine learning-based algorithm for processing massive data collected from the mechanical components of movable bridges , 2016 .

[44]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[45]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[46]  Mario Manzo,et al.  Voting in Transfer Learning System for Ground-Based Cloud Classification , 2021, Mach. Learn. Knowl. Extr..

[47]  Carolyn Penstein Rosé,et al.  Author Age Prediction from Text using Linear Regression , 2011, LaTeCH@ACL.

[48]  Hesham M. Eraqi,et al.  Driver Distraction Identification with an Ensemble of Convolutional Neural Networks , 2019, Journal of Advanced Transportation.

[49]  Sanjit A. Seshia,et al.  Towards Verified Artificial Intelligence , 2016, ArXiv.

[50]  Bin Chen,et al.  Data mining-based fault detection and prediction methods for in-orbit satellite , 2013, Proceedings of 2013 2nd International Conference on Measurement, Information and Control.

[51]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[52]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[53]  S. Sathiya Keerthi,et al.  A simple and efficient algorithm for gene selection using sparse logistic regression , 2003, Bioinform..

[54]  Divish Rengasamy,et al.  Deep Learning with Dynamically Weighted Loss Function for Sensor-Based Prognostics and Health Management , 2020, Sensors.