Generalization Bounds in the Predict-then-Optimize Framework

The predict-then-optimize framework is fundamental in many practical settings: predict the unknown parameters of an optimization problem, and then solve the problem using the predicted values of the parameters. A natural loss function in this environment is to consider the cost of the decisions induced by the predicted parameters, in contrast to the prediction error of the parameters. This loss function was recently introduced in Elmachtoub and Grigas (2017), which called it the Smart Predict-then-Optimize (SPO) loss. Since the SPO loss is nonconvex and noncontinuous, standard results for deriving generalization bounds do not apply. In this work, we provide an assortment of generalization bounds for the SPO loss function. In particular, we derive bounds based on the Natarajan dimension that, in the case of a polyhedral feasible region, scale at most logarithmically in the number of extreme points, but, in the case of a general convex set, have poor dependence on the dimension. By exploiting the structure of the SPO loss function and an additional strong convexity assumption on the feasible region, we can dramatically improve the dependence on the dimension via an analysis and corresponding bounds that are akin to the margin guarantees in classification problems.

[1]  Jean-Philippe Vial,et al.  Strong and Weak Convexity of Sets and Functions , 1983, Math. Oper. Res..

[2]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[3]  G. Ziegler Lectures on Polytopes , 1994 .

[4]  Dmitry Panchenko,et al.  Some New Bounds on the Generalization Error of Combined Classifiers , 2000, NIPS.

[5]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[6]  V. Koltchinskii,et al.  Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.

[7]  B. K. Natarajan On Learning Sets and Functions , 1989, Machine Learning.

[8]  Yann Guermeur,et al.  VC Theory of Large Margin Multi-Category Classifiers , 2007, J. Mach. Learn. Res..

[9]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[10]  Yi-Hao Kao,et al.  Directed Regression , 2009, NIPS.

[11]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[12]  Shai Ben-David,et al.  Multiclass Learnability and the ERM principle , 2011, COLT.

[13]  Ambuj Tewari,et al.  Regularization Techniques for Learning with Matrices , 2009, J. Mach. Learn. Res..

[14]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[15]  Amit Daniely,et al.  Optimal learners for multiclass problems , 2014, COLT.

[16]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[17]  Elad Hazan,et al.  Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[18]  Alexander Binder,et al.  Multi-class SVMs: From Tighter Data-Dependent Generalization Bounds to Novel Algorithms , 2015, NIPS.

[19]  Dimitri P. Bertsekas,et al.  Convex Optimization Algorithms , 2015 .

[20]  Andreas Maurer,et al.  A Vector-Contraction Inequality for Rademacher Complexities , 2016, ALT.

[21]  C. O’Brien Statistical Learning with Sparsity: The Lasso and Generalizations , 2016 .

[22]  Priya L. Donti,et al.  Task-based End-to-end Model Learning , 2017, ArXiv.

[23]  Priya L. Donti,et al.  Task-based End-to-end Model Learning in Stochastic Optimization , 2017, NIPS.

[24]  Yong Liu,et al.  Multi-Class Learning: From Theory to Algorithm , 2018, NeurIPS.

[25]  Milind Tambe,et al.  Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization , 2018, AAAI.

[26]  P.R. Srivastava,et al.  On Data-Driven Prescriptive Analytics with Side Information: A Regularized Nadaraya-Watson Approach , 2021, 2110.04855.

[27]  Dylan J. Foster,et al.  L G ] 1 5 N ov 2 01 9 l ∞ Vector Contraction for Rademacher Complexity , 2019 .

[28]  Milind Tambe,et al.  End to end learning and optimization on graphs , 2019, NeurIPS.

[29]  James Bailey,et al.  Predict+Optimise with Ranking Objectives: Exhaustively Learning Linear Functions , 2019, IJCAI.

[30]  Cynthia Rudin,et al.  The Big Data Newsvendor: Practical Insights from Machine Learning , 2013, Oper. Res..

[31]  Francis Bach,et al.  Learning with Differentiable Perturbed Optimizers , 2020, ArXiv.

[32]  Dimitris Bertsimas,et al.  From Predictive to Prescriptive Analytics , 2014, Manag. Sci..

[33]  Milind Tambe,et al.  Automatically Learning Compact Quality-aware Surrogates for Optimization Problems , 2020, NeurIPS.

[34]  Tias Guns,et al.  Smart Predict-and-Optimize for Hard Combinatorial Optimization Problems , 2019, AAAI.

[35]  Milind Tambe,et al.  MIPaaL: Mixed Integer Program as a Layer , 2019, AAAI.

[36]  Adam N. Elmachtoub,et al.  Decision Trees for Decision-Making under the Predict-then-Optimize Framework , 2020, ICML.

[37]  Kotagiri Ramamohanarao,et al.  Dynamic Programming for Predict+Optimise , 2020, AAAI.

[38]  Kjetil Fagerholt,et al.  A semi-“smart predict then optimize” (semi-SPO) method for efficient ship inspection , 2020 .

[39]  Tias Guns,et al.  Interior Point Solving for LP-based prediction+optimisation , 2020, NeurIPS.

[40]  Hongrui Chu,et al.  Data-driven optimization for last-mile delivery , 2021, Complex & Intelligent Systems.

[41]  Ferdinando Fioretto,et al.  End-to-End Constrained Optimization Learning: A Survey , 2021, IJCAI.

[42]  G. Loke A Blended Model for Predict-then-Optimize , 2021 .

[43]  Adam N. Elmachtoub,et al.  Smart "Predict, then Optimize" , 2017, Manag. Sci..

[44]  Nathan Kallus,et al.  Fast Rates for Contextual Linear Optimization , 2020, Manag. Sci..

[45]  Nam Ho-Nguyen,et al.  Risk Guarantees for End-to-End Prediction and Optimization Processes , 2020, Manag. Sci..