Building Uncertainty Models on Top of Black-Box Predictive APIs

With the commoditization of machine learning, more and more off-the-shelf models are available as part of code libraries or cloud services. Typically, data scientists and other users apply these models as “black boxes” within larger projects. In the case of regressing a scalar quantity, such APIs typically offer a <monospace>predict()</monospace> function, which outputs the estimated target variable (often referred to as <inline-formula> <tex-math notation="LaTeX">$\hat {y}$ </tex-math></inline-formula> or, in code, <monospace>y_hat</monospace>). However, many real-world problems may require some sort of deviation interval or uncertainty score rather than a single point-wise estimate. In other words, a mechanism is needed with which to answer the question “How confident is the system about that prediction?” Motivated by the lack of this characteristic in most predictive APIs designed for regression purposes, we propose a method that adds an uncertainty score to every black-box prediction. Since the underlying model is not accessible, and therefore standard Bayesian approaches are not applicable, we adopt an empirical approach and fit an uncertainty model using a labelled dataset <inline-formula> <tex-math notation="LaTeX">$(x,y)$ </tex-math></inline-formula> and the outputs <inline-formula> <tex-math notation="LaTeX">$\hat {y}$ </tex-math></inline-formula> of the black box. In order to be able to use any predictive system as a black box and adapt to its complex behaviours, we propose three variants of an uncertainty model based on deep networks. The first adds a heteroscedastic noise component to the black-box output, the second predicts the residuals of the black box, and the third performs quantile regression using deep networks. Experiments using real financial data that contain an in-production black-box system and two public datasets (energy forecasting and biology responses) illustrate and quantify how uncertainty scores can be added to black-box outputs.

[1]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[2]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[3]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[4]  Seong Joon Oh,et al.  Towards Reverse-Engineering Black-Box Neural Networks , 2017, ICLR.

[5]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[6]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[7]  Jordi Vitrià,et al.  Uncertainty-Based Rejection Wrappers for Black-Box Classifiers , 2020, IEEE Access.

[8]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[9]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[10]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[11]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[12]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[13]  Jordi Vitrià,et al.  Uncertainty Estimation for Black-Box Classification Models: A Use Case for Sentiment Analysis , 2019, IbPRIA.

[14]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[15]  Jordi Vitrià,et al.  Uncertainty Modelling in Deep Networks: Forecasting Short and Noisy Series , 2018, ECML/PKDD.

[16]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[17]  Kevin Smith,et al.  Bayesian Uncertainty Estimation for Batch Normalized Deep Networks , 2018, ICML.

[18]  Jordi Vitrià,et al.  Modelling heterogeneous distributions with an Uncountable Mixture of Asymmetric Laplacians , 2019, NeurIPS.

[19]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  A. Kiureghian,et al.  Aleatory or epistemic? Does it matter? , 2009 .

[22]  H. Thompson Distribution of Distance to Nth Neighbour in a Population of Randomly Distributed Individuals , 1956 .

[23]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[24]  K. Sudheer,et al.  Quantification of the predictive uncertainty of artificial neural network based river flow forecast models , 2012, Stochastic Environmental Research and Risk Assessment.

[25]  James E. Helmreich Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression and Survival Analysis (2nd Edition) , 2016 .

[26]  Alan Y. Chiang Generalized Additive Models: An Introduction With R, by Simon N. Wood , 2007 .

[27]  Jeremiah Liu,et al.  Accurate Uncertainty Estimation and Decomposition in Ensemble Learning , 2019, NeurIPS.

[28]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[29]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[30]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[31]  Thomas Stone,et al.  PredictionIO: a distributed machine learning server for practical software development , 2013, CIKM.

[32]  Naomi S. Altman,et al.  Quantile regression , 2019, Nature Methods.

[33]  Constantinos Antoniou,et al.  A Metamodel for Estimating Error Bounds in Real-Time Traffic Prediction Systems , 2014, IEEE Transactions on Intelligent Transportation Systems.

[34]  Hao Chen,et al.  Components of information for multiple resolution comparison between maps that share a real variable , 2008, Environmental and Ecological Statistics.

[35]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[36]  Carl E. Rasmussen,et al.  A Practical Monte Carlo Implementation of Bayesian Learning , 1995, NIPS.

[37]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[38]  David Lopez-Paz,et al.  Single-Model Uncertainties for Deep Learning , 2018, NeurIPS.

[39]  Marc Peter Deisenroth,et al.  Distributed Gaussian Processes , 2015, ICML.