Variable importance analysis: A comprehensive review

Measuring variable importance for computational models or measured data is an important task in many applications. It has drawn our attention that the variable importance analysis (VIA) techniques were developed independently in many disciplines. We are strongly aware of the necessity to aggregate all the good practices in each discipline, and compare the relative merits of each method, so as to instruct the practitioners to choose the optimal methods to meet different analysis purposes, and to guide current research on VIA. To this end, all the good practices, including seven groups of methods, i.e., the difference-based variable importance measures (VIMs), parametric regression and related VIMs, nonparametric regression techniques, hypothesis test techniques, variance-based VIMs, moment-independent VIMs and graphic VIMs, are reviewed and compared with a numerical test example set in two situations (independent and dependent cases). For ease of use, the recommendations are provided for different types of applications, and packages as well as software for implementing these VIA techniques are collected. Prospects for future study of VIA techniques are also proposed.

[1]  Carolin Strobl,et al.  An AUC-based permutation variable importance measure for random forests , 2013, BMC Bioinformatics.

[2]  Stefano Tarantola,et al.  Winding Stairs: A sampling tool to compute sensitivity indices , 2000, Stat. Comput..

[3]  A. Saltelli,et al.  Non-parametric statistics in sensitivity analysis for model output: A comparison of selected techniques , 1990 .

[4]  Karen Willcox,et al.  Distributional sensitivity analysis , 2010 .

[5]  Wenrui Hao,et al.  A new interpretation and validation of variance based importance measures for models with correlated inputs , 2013, Comput. Phys. Commun..

[6]  Zhenzhou Lu,et al.  Moment‐Independent Sensitivity Analysis Using Copula , 2014, Risk analysis : an official publication of the Society for Risk Analysis.

[7]  Jon C. Helton,et al.  Survey of sampling-based methods for uncertainty and sensitivity analysis , 2006, Reliab. Eng. Syst. Saf..

[8]  George Z. Gertner,et al.  Extending a global sensitivity analysis technique to models with correlated parameters , 2007, Comput. Stat. Data Anal..

[9]  Emanuele Borgonovo,et al.  A new importance measure for risk-informed decision making , 2001, Reliab. Eng. Syst. Saf..

[10]  Kellie J. Archer,et al.  Empirical characterization of random forest variable importance measures , 2008, Comput. Stat. Data Anal..

[11]  Jon C. Helton,et al.  Multiple predictor smoothing methods for sensitivity analysis: Example results , 2008, Reliab. Eng. Syst. Saf..

[12]  Stefano Tarantola,et al.  Random balance designs for the estimation of first order global sensitivity indices , 2006, Reliab. Eng. Syst. Saf..

[13]  Lu Zhenzhou,et al.  Entropy-Based Importance Measure for Uncertain Model Inputs , 2013 .

[14]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[15]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[16]  I. Sobol Uniformly distributed sequences with an additional uniform property , 1976 .

[17]  A. Saltelli,et al.  Making best use of model evaluations to compute sensitivity indices , 2002 .

[18]  David Makowski,et al.  Multivariate sensitivity analysis to measure global contribution of input factors in dynamic models , 2011, Reliab. Eng. Syst. Saf..

[19]  A. M. Dunker The decoupled direct method for calculating sensitivity coefficients in chemical kinetics , 1984 .

[20]  P. Sen,et al.  Introduction to bivariate and multivariate analysis , 1981 .

[21]  Peter C. Young,et al.  Data-based mechanistic modelling, generalised sensitivity and dominant mode analysis , 1999 .

[22]  Zhenzhou Lu,et al.  Uncertainty Importance Analysis Using Parametric Moment Ratio Functions , 2014, Risk analysis : an official publication of the Society for Risk Analysis.

[23]  Jon C. Helton,et al.  A distribution-free test for the relationship between model input and output when using Latin hypercube sampling , 2003, Reliab. Eng. Syst. Saf..

[24]  José Ferrer,et al.  An improved sampling strategy based on trajectory design for application of the Morris method to systems with many input factors , 2012, Environ. Model. Softw..

[25]  J. C. Helton,et al.  Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations, 1: Review and Comparison of Techniques , 1999 .

[26]  Emanuele Borgonovo,et al.  Sensitivity Analysis in Decision Making , 2013 .

[27]  Enrico Zio,et al.  Variance decomposition-based sensitivity analysis via neural networks , 2003, Reliab. Eng. Syst. Saf..

[28]  Zhenzhou Lu,et al.  A new method on ANN for variance based importance measure analysis of correlated input variables , 2012 .

[29]  Emanuele Borgonovo,et al.  Global sensitivity measures from given data , 2013, Eur. J. Oper. Res..

[30]  J. C. Helton,et al.  Uncertainty and sensitivity analysis in the presence of stochastic and subjective uncertainty , 1997 .

[31]  Roger D. Braddock,et al.  The New Morris Method: an efficient second-order screening method , 2002, Reliab. Eng. Syst. Saf..

[32]  A. Fielding,et al.  Binary Segmentation in Survey Analysis with Particular Reference to AID , 1977 .

[33]  Kwang-Il Ahn,et al.  A new approach for measuring uncertainty importance and distributional sensitivity in probabilistic safety assessment , 1994 .

[34]  Zhenzhou Lu,et al.  Importance analysis for models with correlated input variables by the state dependent parameters method , 2011, Comput. Math. Appl..

[35]  Matieyendou Lamboni,et al.  Derivative-based global sensitivity measures: General links with Sobol' indices and numerical tests , 2012, Math. Comput. Simul..

[36]  C. Genest,et al.  Everything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask , 2007 .

[37]  Zhenzhou Lu,et al.  A new variance-based global sensitivity analysis technique , 2013, Comput. Phys. Commun..

[38]  A. Saltelli,et al.  Importance measures in global sensitivity analysis of nonlinear models , 1996 .

[39]  L. Dixon,et al.  Automatic differentiation of algorithms , 2000 .

[40]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[41]  Joseph M. Calo,et al.  An improved computational method for sensitivity analysis: Green's function method with ‘AIM’ , 1981 .

[42]  Wenrui Hao,et al.  Importance measure of correlated normal variables and its sensitivity analysis , 2012, Reliab. Eng. Syst. Saf..

[43]  Peter C. Young,et al.  Non-parametric estimation of conditional moments for sensitivity analysis , 2009, Reliab. Eng. Syst. Saf..

[44]  Emanuele Borgonovo,et al.  Sensitivity Analysis of Model Output with Input Constraints: A Generalized Rationale for Local Methods , 2008, Risk analysis : an official publication of the Society for Risk Analysis.

[45]  Herschel Rabitz,et al.  Sixth International Conference on Sensitivity Analysis of Model Output Global Sensitivity Analysis for Systems with Independent and / or Correlated Inputs , 2013 .

[46]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[47]  R. Iman,et al.  A distribution-free approach to inducing rank correlation among input variables , 1982 .

[48]  H Christopher Frey,et al.  OF SENSITIVITY ANALYSIS , 2001 .

[49]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[50]  Anne-Laure Boulesteix,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, WIREs Data Mining Knowl. Discov..

[51]  R. Iman,et al.  The Use of the Rank Transform in Regression , 1979 .

[52]  Shuiting Ding,et al.  Global sensitivity analysis for dynamic systems with stochastic input processes , 2013, Reliab. Eng. Syst. Saf..

[53]  E. Wolff N-dimensional measures of dependence. , 1980 .

[54]  Andrea Saltelli,et al.  An effective screening design for sensitivity analysis of large models , 2007, Environ. Model. Softw..

[55]  Thierry Alex Mara,et al.  Variance-based sensitivity indices for models with dependent inputs , 2012, Reliab. Eng. Syst. Saf..

[56]  Víctor Urrea,et al.  Letter to the Editor: Stability of Random Forest importance measures , 2011, Briefings Bioinform..

[57]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology High-Dimensional Regression and Variable Selection Using CAR Scores , 2011 .

[58]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[59]  P. J. Clark,et al.  Distance to Nearest Neighbor as a Measure of Spatial Relationships in Populations , 1954 .

[60]  J. W. Johnson A Heuristic Method for Estimating the Relative Weight of Predictor Variables in Multiple Regression , 2000, Multivariate behavioral research.

[61]  Nilay Shah,et al.  The identification of model effective dimensions using global sensitivity analysis , 2011, Reliab. Eng. Syst. Saf..

[62]  Sergei S. Kucherenko,et al.  Derivative based global sensitivity measures and their link with global sensitivity indices , 2009, Math. Comput. Simul..

[63]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[64]  Eugenijus Uspuras,et al.  Sensitivity analysis using contribution to sample variance plot: Application to a water hammer model , 2012, Reliab. Eng. Syst. Saf..

[65]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[66]  Ilya M. Sobol,et al.  Sensitivity Estimates for Nonlinear Mathematical Models , 1993 .

[67]  Brian J. Williams,et al.  Sensitivity analysis when model outputs are functions , 2006, Reliab. Eng. Syst. Saf..

[68]  Yiping Shen,et al.  Statistical Applications in Genetics and Molecular Biology Comparison of Clinical Subgroup aCGH Profiles through Pseudolikelihood Ratio Tests , 2011 .

[69]  J. Fox Nonparametric Regression Appendix to An R and S-PLUS Companion to Applied Regression , 2002 .

[70]  E. Borgonovo Measuring Uncertainty Importance: Investigation and Comparison of Alternative Approaches , 2006, Risk analysis : an official publication of the Society for Risk Analysis.

[71]  Jon C. Helton,et al.  Multiple predictor smoothing methods for sensitivity analysis: Description of techniques , 2008, Reliab. Eng. Syst. Saf..

[72]  Alison S. Tomlin,et al.  GUI-HDMR - A software tool for global sensitivity analysis of complex models , 2009, Environ. Model. Softw..

[73]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[74]  Carolin Strobl,et al.  The behaviour of random forest permutation-based variable importance measures under predictor correlation , 2010, BMC Bioinformatics.

[75]  T. Simpson,et al.  Analysis of support vector regression for approximation of complex engineering analyses , 2005, DAC 2003.

[76]  F. J. Davis,et al.  Illustration of Sampling‐Based Methods for Uncertainty and Sensitivity Analysis , 2002, Risk analysis : an official publication of the Society for Risk Analysis.

[77]  D. Basak,et al.  Support Vector Regression , 2008 .

[78]  D. Cacuci Sensitivity theory for nonlinear systems. I. Nonlinear functional analysis approach , 1981 .

[79]  U. Grömping Dependence of Variable Importance in Random Forests on the Shape of the Regressor Space , 2009 .

[80]  Zhenzhou Lu,et al.  Monte Carlo simulation for moment-independent sensitivity analysis , 2013, Reliab. Eng. Syst. Saf..

[81]  George Z. Gertner,et al.  A general first-order global sensitivity analysis method , 2008, Reliab. Eng. Syst. Saf..

[82]  A. Saltelli,et al.  Sensitivity analysis for chemical models. , 2005, Chemical reviews.

[83]  Zhenzhou Lu,et al.  Moment independent sensitivity analysis with correlations , 2014 .

[84]  George Z. Gertner,et al.  Uncertainty and sensitivity analysis for models with correlated parameters , 2008, Reliab. Eng. Syst. Saf..

[85]  Thierry Alex Mara,et al.  Extension of the RBD-FAST method to the computation of global sensitivity indices , 2009, Reliab. Eng. Syst. Saf..

[86]  Dongbin Xiu,et al.  Variance-based global sensitivity analysis via sparse-grid interpolation and cubature , 2011 .

[87]  Wenrui Hao,et al.  Efficient sampling methods for global reliability sensitivity analysis , 2012, Comput. Phys. Commun..

[88]  Jon C. Helton,et al.  Guest editorial: treatment of aleatory and epistemic uncertainty in performance assessments for complex systems , 1996 .

[89]  W. J. Conover,et al.  Practical Nonparametric Statistics , 1972 .

[90]  E. Polley,et al.  Statistical Applications in Genetics and Molecular Biology Random Forests for Genetic Association Studies , 2011 .

[91]  Jack P. C. Kleijnen,et al.  Kriging Metamodeling in Simulation: A Review , 2007, Eur. J. Oper. Res..

[92]  A. O'Hagan,et al.  Probabilistic sensitivity analysis of complex models: a Bayesian approach , 2004 .

[93]  Emanuele Borgonovo,et al.  Sensitivity analysis with finite changes: An application to modified EOQ models , 2010, Eur. J. Oper. Res..

[94]  K. Shuler,et al.  Nonlinear sensitivity analysis of multiparameter model systems , 1977 .

[95]  Allan P. White,et al.  Technical Note: Bias in Information-Based Measures in Decision Tree Induction , 1994, Machine Learning.

[96]  Wei Tian,et al.  A review of sensitivity analysis methods in building energy analysis , 2013 .

[97]  Zhenzhou Lu,et al.  Moment-independent regional sensitivity analysis: Application to an environmental model , 2013, Environ. Model. Softw..

[98]  Emanuele Borgonovo Differential, criticality and Birnbaum importance measures: An application to basic event, groups and SSCs in event trees and binary decision diagrams , 2007, Reliab. Eng. Syst. Saf..

[99]  Saltelli Andrea,et al.  Global Sensitivity Analysis: The Primer , 2008 .

[100]  Emanuele Borgonovo,et al.  A new uncertainty importance measure , 2007, Reliab. Eng. Syst. Saf..

[101]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[102]  Xin Wang,et al.  SNP interaction detection with Random Forests in high-dimensional genetic data , 2012, BMC Bioinformatics.

[103]  Paola Annoni,et al.  Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index , 2010, Comput. Phys. Commun..

[104]  S. Chatterjee,et al.  Regression Analysis by Example , 1979 .

[105]  Tian Longfei A Novel Method for Analyzing Variance Based Importance Measure of Correlated Input Variables , 2011 .

[106]  Constantinos C. Pantelides,et al.  Monte Carlo evaluation of derivative-based global sensitivity measures , 2009, Reliab. Eng. Syst. Saf..

[107]  D. Budescu Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. , 1993 .

[108]  I. Sobola,et al.  Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates , 2001 .

[109]  M. Morris,et al.  Transformations and invariance in the sensitivity analysis of computer experiments , 2014 .

[110]  Jon C. Helton,et al.  Conceptual structure and computational organization of the 2008 performance assessment for the proposed high-level radioactive waste repository at Yucca Mountain, Nevada , 2014, Reliab. Eng. Syst. Saf..

[111]  Stefano Tarantola,et al.  Contribution to the sample mean plot for graphical and numerical sensitivity analysis , 2009, Reliab. Eng. Syst. Saf..

[112]  Dirk P. Kroese,et al.  Kernel density estimation via diffusion , 2010, 1011.2602.

[113]  David S. Siroky Navigating Random Forests and related advances in algorithmic modeling , 2009 .

[114]  Stephen C. Hora,et al.  Aleatory and epistemic uncertainty in probability elicitation with an example from hazardous waste management , 1996 .

[115]  B. Schweizer,et al.  On Nonparametric Measures of Dependence for Random Variables , 1981 .

[116]  Jon C. Helton,et al.  Uncertainty and sensitivity analysis techniques for use in performance assessment for radioactive waste disposal , 1993 .

[117]  Max D. Morris,et al.  Factorial sampling plans for preliminary computational experiments , 1991 .

[118]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[119]  Sharif Rahman,et al.  Global sensitivity analysis by polynomial dimensional decomposition , 2011, Reliab. Eng. Syst. Saf..

[120]  M. Wand Local Regression and Likelihood , 2001 .

[121]  M. Jansen,et al.  Monte Carlo estimation of uncertainty contributions from several independent multivariate sources. , 1994 .

[122]  M. Jansen Analysis of variance designs for model output , 1999 .

[123]  Paul Bratley,et al.  Algorithm 659: Implementing Sobol's quasirandom sequence generator , 1988, TOMS.

[124]  David V. Budescu,et al.  Beyond Global Measures of Relative Importance: Some Insights from Dominance Analysis , 2004 .

[125]  Bruno Sudret,et al.  Global sensitivity analysis using polynomial chaos expansions , 2008, Reliab. Eng. Syst. Saf..

[126]  Zhenzhou Lu,et al.  A new algorithm for variance based importance analysis of models with correlated inputs , 2013 .

[127]  Zhenzhou Lu,et al.  Regional sensitivity analysis using revised mean and variance ratio functions , 2014, Reliab. Eng. Syst. Saf..

[128]  A. M. Dunker,et al.  Efficient calculation of sensitivity coefficients for complex atmospheric models , 1981 .

[129]  Albert J. Valocchi,et al.  Global Sensitivity Analysis for multivariate output using Polynomial Chaos Expansion , 2014, Reliab. Eng. Syst. Saf..

[130]  Kristin K. Nicodemus,et al.  Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures , 2011, Briefings Bioinform..

[131]  B. J. Winer Statistical Principles in Experimental Design , 1992 .

[132]  Moon-Hyun Chun,et al.  An uncertainty importance measure using a distance metric for the change in a cumulative distribution function , 2000, Reliab. Eng. Syst. Saf..

[133]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[134]  Carolin Strobl,et al.  A new variable importance measure for random forests with missing data , 2012, Statistics and Computing.

[135]  P. Diggle,et al.  Some Distance-Based Tests of Independence for Sparsely-Sampled Multivariate Spatial Point Patterns , 1983 .

[136]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[137]  I. Sobol,et al.  A new derivative based importance criterion for groups of variables and its link with the global sensitivity indices , 2010, Comput. Phys. Commun..

[138]  W. Kruskal Relative Importance by Averaging Over Orderings , 1987 .

[139]  Andrea Saltelli,et al.  Screening important inputs in models with strong interaction properties , 2009, Reliab. Eng. Syst. Saf..

[140]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[141]  C. Fortuin,et al.  Study of the sensitivity of coupled reaction systems to uncertainties in rate coefficients. I Theory , 1973 .

[142]  Yu-Shan Shih,et al.  Variable selection bias in regression trees with constant fits , 2004, Comput. Stat. Data Anal..

[143]  Paola Annoni,et al.  Estimation of global sensitivity indices for models with dependent variables , 2012, Comput. Phys. Commun..

[144]  Peter C. Young,et al.  State Dependent Parameter metamodelling and sensitivity analysis , 2007, Comput. Phys. Commun..

[145]  S. Wood Modelling and smoothing parameter estimation with multiple quadratic penalties , 2000 .

[146]  A. Saltelli,et al.  A quantitative model-independent method for global sensitivity analysis of model output , 1999 .

[147]  H. Rabitz,et al.  Practical Approaches To Construct RS-HDMR Component Functions , 2002 .

[148]  Jon C. Helton,et al.  Analysis of computationally demanding models with continuous and categorical inputs , 2013, Reliab. Eng. Syst. Saf..

[149]  Ulrike Groemping,et al.  Relative Importance for Linear Regression in R: The Package relaimpo , 2006 .

[150]  William Becker,et al.  A comparison of two sampling methods for global sensitivity analysis , 2012, Comput. Phys. Commun..

[151]  M. Elisabeth Paté-Cornell,et al.  Uncertainties in risk analysis: Six levels of treatment , 1996 .

[152]  Emanuele Borgonovo,et al.  Invariant Probabilistic Sensitivity Analysis , 2013, Manag. Sci..

[153]  Stefano Tarantola,et al.  Estimating the approximation error when fixing unessential factors in global sensitivity analysis , 2007, Reliab. Eng. Syst. Saf..

[154]  Jon C. Helton,et al.  Implementation and evaluation of nonparametric regression procedures for sensitivity analysis of computationally demanding models , 2009, Reliab. Eng. Syst. Saf..

[155]  Jian Bi A REVIEW OF STATISTICAL METHODS FOR DETERMINATION OF RELATIVE IMPORTANCE OF CORRELATED PREDICTORS AND IDENTIFICATION OF DRIVERS OF CONSUMER LIKING , 2012 .

[156]  Wei Zhong Liu,et al.  Bias in information-based measures in decision tree induction , 1994, Machine Learning.

[157]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[158]  U. Grömping Estimators of Relative Importance in Linear Regression Based on Variance Decomposition , 2007 .

[159]  Chonggang Xu,et al.  Decoupling correlated and uncorrelated parametric uncertainty contributions for nonlinear models , 2013 .

[160]  George Z. Gertner,et al.  Understanding and comparisons of different sampling approaches for the Fourier Amplitudes Sensitivity Test (FAST) , 2011, Comput. Stat. Data Anal..

[161]  Roger David Braddock,et al.  The use of graph theory in the sensitivity analysis of the model output: a second order screening method , 1999 .

[162]  J. Peacock Two-dimensional goodness-of-fit testing in astronomy , 1983 .

[163]  Gareth W. Parry,et al.  The characterization of uncertainty in probabilistic risk assessments of complex systems , 1996 .

[164]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[165]  D. Cacuci Sensitivity theory for nonlinear systems. II. Extensions to additional classes of responses , 1981 .

[166]  I. Sobol On the distribution of points in a cube and the approximate evaluation of integrals , 1967 .

[167]  Matieyendou Lamboni,et al.  Multivariate global sensitivity analysis for dynamic crop models , 2009 .

[168]  C. Granger,et al.  USING THE MUTUAL INFORMATION COEFFICIENT TO IDENTIFY LAGS IN NONLINEAR MODELS , 1994 .

[169]  Andrea Saltelli,et al.  From screening to quantitative sensitivity analysis. A unified approach , 2011, Comput. Phys. Commun..

[170]  Art B. Owen,et al.  Latin supercube sampling for very high-dimensional simulations , 1998, TOMC.

[171]  James M. LeBreton,et al.  History and Use of Relative Importance Indices in Organizational Research , 2004 .

[172]  R. Nelsen An Introduction to Copulas , 1998 .

[173]  D. Cacuci,et al.  A Comparative Review of Sensitivity and Uncertainty Analysis of Large-Scale Systems—II: Statistical Methods , 2004 .

[174]  R. Dawson,et al.  Sensitivity Analysis for Hydraulic Models , 2009 .

[175]  H. Rabitz,et al.  High Dimensional Model Representations , 2001 .

[176]  D. Budescu,et al.  The dominance analysis approach for comparing predictors in multiple regression. , 2003, Psychological methods.

[177]  Robert J. Beaver,et al.  An Introduction to Probability Theory and Mathematical Statistics , 1977 .

[178]  Anne-Laure Boulesteix,et al.  Stability and aggregation of ranked gene lists , 2009, Briefings Bioinform..