A Higher-Order Swiss Army Infinitesimal Jackknife

Cross validation (CV) and the bootstrap are ubiquitous model-agnostic tools for assessing the error or variability of machine learning and statistical estimators. However, these methods require repeatedly re-fitting the model with different weighted versions of the original dataset, which can be prohibitively time-consuming. For sufficiently regular optimization problems the optimum depends smoothly on the data weights, and so the process of repeatedly re-fitting can be approximated with a Taylor series that can be often evaluated relatively quickly. The first-order approximation is known as the "infinitesimal jackknife" in the statistics literature and has been the subject of recent interest in machine learning for approximate CV. In this work, we consider high-order approximations, which we call the "higher-order infinitesimal jackknife" (HOIJ). Under mild regularity conditions, we provide a simple recursive procedure to compute approximations of all orders with finite-sample accuracy bounds. Additionally, we show that the HOIJ can be efficiently computed even in high dimensions using forward-mode automatic differentiation. We show that a linear approximation with bootstrap weights approximation is equivalent to those provided by asymptotic normal approximations. Consequently, the HOIJ opens up the possibility of enjoying higher-order accuracy properties of the bootstrap using local approximations. Consistency of the HOIJ for leave-one-out CV under different asymptotic regimes follows as corollaries from our finite-sample bounds under additional regularity assumptions. The generality of the computation and bounds motivate the name "higher-order Swiss Army infinitesimal jackknife."

[1]  Andreas Christmann,et al.  Bouligand Derivatives and Robustness of Support Vector Machines for Regression , 2007, J. Mach. Learn. Res..

[2]  B. R. Clarke Uniqueness and Fréchet differentiability of functional solutions to maximum likelihood type equations , 1983 .

[3]  Kamiar Rahnama Rad,et al.  A scalable estimate of the extra-sample prediction error via approximate leave-one-out , 2018, 1801.10243.

[4]  R. V. Mises On the Asymptotic Distribution of Differentiable Statistical Functions , 1947 .

[5]  Saharon Rosset,et al.  From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation , 2017, Journal of the American Statistical Association.

[6]  L. Stefanski,et al.  The Calculus of M-Estimation , 2002 .

[7]  Trevor J. Hastie,et al.  Confidence intervals for random forests: the jackknife and the infinitesimal jackknife , 2013, J. Mach. Learn. Res..

[8]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[9]  Yong Liu,et al.  Efficient Approximation of Cross-Validation for Kernel Methods using Bouligand Influence Function , 2014, ICML.

[10]  Jun Shao,et al.  Differentiability of Statistical Functionals and Consistency of the Jackknife , 1993 .

[11]  Dudley,et al.  Real Analysis and Probability: Measurability: Borel Isomorphism and Analytic Sets , 2002 .

[12]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[13]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[14]  M. Debruyne,et al.  Model Selection in Kernel Based Regression using the Influence Function , 2008 .

[15]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[16]  L. Fernholz von Mises Calculus For Statistical Functionals , 1983 .

[17]  B. Efron The Estimation of Prediction Error , 2004 .

[18]  I. Sandberg Global inverse function theorems , 1980 .

[19]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[20]  Michael I. Jordan,et al.  A Swiss Army Infinitesimal Jackknife , 2018, AISTATS.

[21]  James R. Schott,et al.  Matrix Analysis for Statistics , 2005 .

[22]  Suchi Saria,et al.  Can You Trust This Prediction? Auditing Pointwise Reliability After Learning , 2019, AISTATS.

[23]  J. Shao,et al.  The jackknife and bootstrap , 1996 .

[24]  Vahid Tarokh,et al.  On Optimal Generalizability in Parametric Learning , 2017, NIPS.