An Online-Learning Approach to Inverse Optimization

In this paper, we demonstrate how to learn the objective function of a decision-maker while only observing the problem input data and the decision-maker's corresponding decisions over multiple rounds. Our approach is based on online learning and works for linear objectives over arbitrary feasible sets for which we have a linear optimization oracle. As such, it generalizes previous approaches based on KKT-system decomposition and dualization. The two exact algorithms we present -- based on multiplicative weights updates and online gradient descent respectively -- converge at a rate of O(1/sqrt(T)) and thus allow taking decisions which are essentially as good as those of the observed decision-maker already after relatively few observations. We also discuss several useful generalizations, such as the approximate learning of non-linear objective functions and the case of suboptimal observations. Finally, we show the effectiveness and possible applications of our methods in a broad computational study.

[1]  Nacim Ramdani,et al.  Towards solving inverse optimal control in a bounded-error framework , 2015, 2015 American Control Conference (ACC).

[2]  Timothy C. Y. Chan,et al.  Inverse Optimization: Closed-Form Solutions, Geometry, and Goodness of Fit , 2015, Manag. Sci..

[3]  Garud Iyengar,et al.  Inverse conic programming with applications , 2005, Oper. Res. Lett..

[4]  Dimitris Bertsimas,et al.  Pricing from Observational Data , 2016 .

[5]  Ioannis C. Konstantakopoulos,et al.  Smart building energy efficiency via social game: a robust utility learning framework for closing–the–loop , 2016, 2016 1st International Workshop on Science of Smart City Operations and Platforms Engineering (SCOPE) in partnership with Global City Teams Challenge (GCTC) (SCOPE - GCTC).

[6]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[7]  Luis Montesano,et al.  On the Performance of Maximum Likelihood Inverse Reinforcement Learning , 2012, ArXiv.

[8]  Dimitris Bertsimas,et al.  The Power and Limits of Predictive Approaches to Observational-Data-Driven Optimization , 2016, 1605.02347.

[9]  Maryam Kamgarpour,et al.  Mixed Strategies for Robust Optimization of Unknown Objectives , 2020, AISTATS.

[10]  Tristan Perez,et al.  Discrete-time inverse optimal control with partial-state information: A soft-optimality approach with constrained state estimation , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[11]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[12]  Gianni Ferretti,et al.  Generation of human walking paths , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[14]  Robert E. Schapire,et al.  A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[15]  Neal Master Learning to Emulate an Expert Projective Cone Scheduler , 2019, 2019 American Control Conference (ACC).

[16]  Philippe L. Toint,et al.  On an instance of the inverse shortest paths problem , 1992, Math. Program..

[17]  Kholekile L. Gwebu,et al.  Some experiments on subjective optimisation , 2011 .

[18]  Sebastian Pokutta,et al.  Emulating the Expert: Inverse Optimization through Online Learning , 2017, ICML.

[19]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[20]  Marvin D. Troutt,et al.  Linear programming system identification , 2005, Eur. J. Oper. Res..

[21]  Henrik Ohlsson,et al.  Incentive Design and Utility Learning via Energy Disaggregation , 2013, 1312.1394.

[22]  Ravindra K. Ahuja,et al.  Solving Inverse Spanning Tree Problems Through Network Flow Techniques , 1999, Oper. Res..

[23]  Daniel Kuhn,et al.  Data-driven inverse optimization with imperfect information , 2015, Mathematical Programming.

[24]  Madeleine Udell,et al.  Dynamic Assortment Personalization in High Dimensions , 2016, Oper. Res..

[25]  A. Belianin,et al.  A Game-Theoretic Approach , 2001 .

[26]  J. Li Inverse Optimization of Convex Risk Functions , 2016, 1607.07099.

[27]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[28]  Melanie Nicole Zeilinger,et al.  Utility learning model predictive control for personal electric loads , 2014, 53rd IEEE Conference on Decision and Control.

[29]  David Simchi-Levi,et al.  OM Forum - OM Research: From Problem-Driven to Data-Driven Research , 2014, Manuf. Serv. Oper. Manag..

[30]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[31]  Thomas D. Nielsen,et al.  Learning a decision maker's utility function from (possibly) inconsistent behavior , 2004, Artif. Intell..

[32]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[33]  Automatic Treatment Planning with Convex Imputing , 2014 .

[34]  Mohsen Bayati,et al.  Dynamic Pricing with Demand Covariates , 2016, 1604.07463.

[35]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[36]  Edwin Roberts,et al.  Transportation networks , 1977, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.

[37]  P. Toint,et al.  The inverse shortest paths problem with upper bounds on shortest paths costs , 1997 .

[38]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[39]  Parag A. Pathak,et al.  Massachusetts Institute of Technology , 1964, Nature.

[40]  Éva Tardos,et al.  Fast approximation algorithms for fractional packing and covering problems , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[41]  Yiran Chen,et al.  Generalized Inverse Optimization through Online Learning , 2018, NeurIPS.

[42]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[43]  Alexandre M. Bayen,et al.  Imputing a variational inequality function or a convex objective function: A robust approach , 2018 .

[44]  D. Simchi-Levi,et al.  A Statistical Learning Approach to Personalization in Revenue Management , 2015, Manag. Sci..

[45]  Samir Khuller,et al.  On Correcting Inputs: Inverse Optimization for Online Structured Prediction , 2015, FSTTCS.

[46]  Ravindra K. Ahuja,et al.  A Faster Algorithm for the Inverse Spanning Tree Problem , 2000, J. Algorithms.

[47]  Stephen P. Boyd,et al.  Imputing a convex objective function , 2011, 2011 IEEE International Symposium on Intelligent Control.

[48]  Marvin D. Troutt,et al.  Behavioral Estimation of Mathematical Programming Objective Function Coefficients , 2006, Manag. Sci..

[49]  Zuo-Jun Max Shen,et al.  Inverse Optimization with Noisy Data , 2015, Oper. Res..

[50]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[51]  David Pisinger,et al.  Where are the hard knapsack problems? , 2005, Comput. Oper. Res..

[52]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[53]  Dick den Hertog,et al.  Bridging the gap between predictive and prescriptive analytics-new optimization methodology needed , 2016 .

[54]  Marvin D. Troutt,et al.  Linear programming system identification: The general nonnegative parameters case , 2008, Eur. J. Oper. Res..

[55]  Shahin Shahrampour,et al.  Online Optimization : Competing with Dynamic Comparators , 2015, AISTATS.

[56]  Andrew J. Schaefer,et al.  Inverse integer programming , 2009, Optim. Lett..

[57]  D. Burtony On the Use of an Inverse Shortest Paths Algorithm for Recovering Linearly Correlated Costs , 1997 .

[58]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[59]  Madeleine Udell,et al.  Learning Preferences from Assortment Choices in a Heterogeneous Population , 2015, ArXiv.

[60]  Gábor Lugosi,et al.  Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..