Discriminative Learning of Prediction Intervals

In this work we consider the task of constructing prediction intervals in an inductive batch setting. We present a discriminative learning framework which optimizes the expected error rate under a budget constraint on the interval sizes. Most current methods for constructing prediction intervals offer guarantees for a single new test point. Applying these methods to multiple test points can result in a high computational overhead and degraded statistical guarantees. By focusing on expected errors, our method allows for variability in the per-example conditional error rates. As we demonstrate both analytically and empirically, this flexibility can increase the overall accuracy, or alternatively, reduce the average interval size. While the problem we consider is of a regressive flavor, the loss we use is combinatorial. This allows us to provide PAC-style, finite-sample guarantees. Computationally, we show that our original objective is NP-hard, and suggest a tractable convex surrogate. We conclude with a series of experimental evaluations.

[1]  Saeid Nahavandi,et al.  Improving the Quality of Prediction Intervals Through Optimal Aggregation , 2015, IEEE Transactions on Industrial Electronics.

[2]  Alexander Gammerman,et al.  Hedging Predictions in Machine Learning: The Second Computer Journal Lecture , 2006, Comput. J..

[3]  Hannes Leeb,et al.  Leave-one-out prediction intervals in linear regression models with many variables , 2016, 1602.05801.

[4]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[5]  Larry Wasserman,et al.  Distribution‐free prediction bands for non‐parametric regression , 2014 .

[6]  Vladimir Vovk,et al.  Efficiency of conformalized ridge regression , 2014, COLT.

[7]  David J. Olive Prediction intervals for regression models , 2007, Comput. Stat. Data Anal..

[8]  Wooseok Ha,et al.  Trimmed Conformal Prediction for High-Dimensional Models , 2016, 1611.09933.

[9]  Kwok-Wing Chau,et al.  ANN-based interval forecasting of streamflow discharges using the LUBE method and MOFIPS , 2015, Eng. Appl. Artif. Intell..

[10]  J. Robins,et al.  Distribution-Free Prediction Sets , 2013, Journal of the American Statistical Association.

[11]  R. Stine Bootstrap Prediction Intervals for Regression , 1985 .

[12]  Larry A. Wasserman,et al.  A conformal prediction approach to explore functional data , 2013, Annals of Mathematics and Artificial Intelligence.

[13]  George A. F. Seber,et al.  Linear regression analysis , 1977 .

[14]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[15]  Edoardo Amaldi,et al.  The Complexity and Approximability of Finding Maximum Feasible Subsystems of Linear Relations , 1995, Theor. Comput. Sci..

[16]  Jayaram K. Sankaran A note on resolving infeasibility in linear programs by constraint relaxation , 1993, Oper. Res. Lett..

[17]  Amir F. Atiya,et al.  Lower Upper Bound Estimation Method for Construction of Neural Network-Based Prediction Intervals , 2011, IEEE Transactions on Neural Networks.

[18]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[19]  R. Schmoyer Asymptotically valid prediction intervals for linear models , 1992 .

[20]  Alessandro Rinaldo,et al.  Distribution-Free Predictive Inference for Regression , 2016, Journal of the American Statistical Association.

[21]  Vladimir Vovk,et al.  A tutorial on conformal prediction , 2007, J. Mach. Learn. Res..