Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions

Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data. Yet, when using RNNs to inform decision-making, predictions by themselves are not sufficient; we also need estimates of predictive uncertainty. Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods; these are computationally prohibitive, and require major alterations to the RNN architecture and training. Capitalizing on ideas from classical jackknife resampling, we develop a frequentist alternative that: (a) does not interfere with model training or compromise its accuracy, (b) applies to any RNN architecture, and (c) provides theoretical coverage guarantees on the estimated uncertainty intervals. Our method derives predictive uncertainty from the variability of the (jackknife) sampling distribution of the RNN outputs, which is estimated by repeatedly deleting blocks of (temporally-correlated) training data, and collecting the predictions of the RNN re-trained on the remaining data. To avoid exhaustive re-training, we utilize influence functions to estimate the effect of removing training data blocks on the learned RNN parameters. Using data from a critical care setting, we demonstrate the utility of uncertainty quantification in sequential decision-making.

[1]  Jeremy Nixon,et al.  Analyzing the role of model uncertainty for electronic health records , 2019, CHIL.

[2]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[3]  Ryan J. Tibshirani,et al.  Predictive inference with the jackknife+ , 2019, The Annals of Statistics.

[4]  Percy Liang,et al.  On the Accuracy of Influence Functions for Measuring Group Effects , 2019, NeurIPS.

[5]  Syama Sundar Rangapuram,et al.  Probabilistic Forecasting with Spline Quantile Function RNNs , 2019, AISTATS.

[6]  Emmanuel J. Candès,et al.  Conformal Prediction Under Covariate Shift , 2019, NeurIPS.

[7]  Andrew Gordon Wilson,et al.  A Simple Baseline for Bayesian Uncertainty in Deep Learning , 2019, NeurIPS.

[8]  Nigam H. Shah,et al.  Countdown Regression: Sharp and Calibrated Survival Predictions , 2018, UAI.

[9]  Michael I. Jordan,et al.  A Swiss Army Infinitesimal Jackknife , 2018, AISTATS.

[10]  Mihaela van der Schaar,et al.  Attentive State-Space Modeling of Disease Progression , 2019, NeurIPS.

[11]  Furno Marilena,et al.  Quantile Regression , 2018, Wiley Series in Probability and Statistics.

[12]  Lu Wang,et al.  Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation , 2018, KDD.

[13]  C. Gollier The Economics of Risk and Uncertainty , 2018 .

[14]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.

[15]  David Barber,et al.  A Scalable Laplace Approximation for Neural Networks , 2018, ICLR.

[16]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[17]  Mihaela van der Schaar,et al.  A Hidden Absorbing Semi-Markov Model for Informatively Censored Temporal Data: Learning and Inference , 2016, J. Mach. Learn. Res..

[18]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[19]  Matthias W. Seeger,et al.  Deep State Space Models for Time Series Forecasting , 2018, NeurIPS.

[20]  Bryan Lim,et al.  Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks , 2018, NeurIPS.

[21]  K. Torkkola,et al.  A Multi-Horizon Quantile Recurrent Forecaster , 2017, 1711.11053.

[22]  Zoubin Ghahramani,et al.  Variational Gaussian Dropout is not Bayesian , 2017, 1711.02989.

[23]  Nikolay Laptev,et al.  Deep and Confident Prediction for Time Series at Uber , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[24]  K. P. Soman,et al.  Stock price prediction using LSTM, RNN and CNN-sliding window model , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[25]  Oriol Vinyals,et al.  Bayesian Recurrent Neural Networks , 2017, ArXiv.

[26]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[27]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[28]  Uri Shalit,et al.  Structured Inference Networks for Nonlinear State Space Models , 2016, AAAI.

[29]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[30]  Naman Agarwal,et al.  Second Order Stochastic Optimization in Linear Time , 2016, ArXiv.

[31]  Jen-Tzung Chien,et al.  Bayesian Recurrent Neural Network for Language Modeling , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[33]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[34]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[35]  G. Hooker,et al.  Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests , 2014, J. Mach. Learn. Res..

[36]  Ian Osband,et al.  Risk versus Uncertainty in Deep Learning: Bayes, Bootstrap and the Dangers of Dropout , 2016 .

[37]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[38]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[39]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[40]  Trevor J. Hastie,et al.  Confidence intervals for random forests: the jackknife and the infinitesimal jackknife , 2013, J. Mach. Learn. Res..

[41]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[42]  G. Cardoso,et al.  An OCD patient presenting with a cerebellum venous variant in a family with a strong schizophrenia loading: a case report. , 2012, Innovations in clinical neuroscience.

[43]  S. Wood,et al.  Coverage Properties of Confidence Intervals for Generalized Additive Model Components , 2012 .

[44]  L. Opler,et al.  Hematologic impact of antibiotic administration on patients taking clozapine. , 2012, Innovations in clinical neuroscience.

[45]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[46]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[47]  Cosma Rohilla Shalizi,et al.  Estimating beta-mixing coefficients , 2011, AISTATS.

[48]  Nikolay I. Nikolaev,et al.  Recursive Bayesian Recurrent Neural Networks for Time-Series Modeling , 2010, IEEE Transactions on Neural Networks.

[49]  Jean Roy,et al.  Expression of uncertainty in linguistic data , 2008, 2008 11th International Conference on Information Fusion.

[50]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[51]  J. Lawless,et al.  Frequentist prediction intervals and predictive distributions , 2005 .

[52]  James O. Berger,et al.  The interplay of Bayesian and frequentist analysis , 2004 .

[53]  S. Brett,et al.  White cell count and intensive care unit outcome , 2003, Anaesthesia.

[54]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[55]  James W. Taylor A Quantile Regression Neural Network Approach to Estimating the Conditional Density of Multiperiod Returns , 2000 .

[56]  Barak A. Pearlmutter Fast Exact Multiplication by the Hessian , 1994, Neural Computation.

[57]  B. Efron Jackknife‐After‐Bootstrap Standard Errors and Influence Functions , 1992 .

[58]  H. Künsch The Jackknife and the Bootstrap for General Stationary Observations , 1989 .

[59]  C. Jennison,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[60]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[61]  Rupert G. Miller The jackknife-a review , 1974 .

[62]  W. J. Studden,et al.  Theory Of Optimal Experiments , 1972 .

[63]  M. H. Quenouille NOTES ON BIAS IN ESTIMATION , 1956 .