Emulating the Expert: Inverse Optimization through Online Learning

In this paper, we demonstrate how to learn the objective function of a decision maker while only observing the problem input data and the decision maker’s corresponding decisions over multiple rounds. Our approach is based on online learning techniques and works for linear objectives over arbitrary sets for which we have a linear optimization oracle and as such generalizes previous work based on KKT-system decomposition and dualization approaches. The applicability of our framework for learning linear constraints is also discussed briefly. Our algorithm converges at a rate of O( 1 √ T ), and we demonstrate its effectiveness and applications in preliminary computational results.

[1]  Garud Iyengar,et al.  Inverse conic programming with applications , 2005, Oper. Res. Lett..

[2]  Automatic Treatment Planning with Convex Imputing , 2014 .

[3]  Andrew J. Schaefer,et al.  Inverse integer programming , 2009, Optim. Lett..

[4]  D. Burtony On the Use of an Inverse Shortest Paths Algorithm for Recovering Linearly Correlated Costs , 1997 .

[5]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[6]  Madeleine Udell,et al.  Learning Preferences from Assortment Choices in a Heterogeneous Population , 2015, ArXiv.

[7]  Mohsen Bayati,et al.  Dynamic Pricing with Demand Covariates , 2016, 1604.07463.

[8]  Gábor Lugosi,et al.  Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..

[9]  D. Simchi-Levi,et al.  A Statistical Learning Approach to Personalization in Revenue Management , 2015, Manag. Sci..

[10]  Samir Khuller,et al.  On Correcting Inputs: Inverse Optimization for Online Structured Prediction , 2015, FSTTCS.

[11]  Ravindra K. Ahuja,et al.  A Faster Algorithm for the Inverse Spanning Tree Problem , 2000, J. Algorithms.

[12]  Marvin D. Troutt,et al.  Linear programming system identification , 2005, Eur. J. Oper. Res..

[13]  Thomas D. Nielsen,et al.  Learning a decision maker's utility function from (possibly) inconsistent behavior , 2004, Artif. Intell..

[14]  J. Li Inverse Optimization of Convex Risk Functions , 2016, 1607.07099.

[15]  Marvin D. Troutt,et al.  Linear programming system identification: The general nonnegative parameters case , 2008, Eur. J. Oper. Res..

[16]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[17]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[18]  Dick den Hertog,et al.  Bridging the gap between predictive and prescriptive analytics-new optimization methodology needed , 2016 .

[19]  Marvin D. Troutt,et al.  Behavioral Estimation of Mathematical Programming Objective Function Coefficients , 2006, Manag. Sci..

[20]  Zuo-Jun Max Shen,et al.  Inverse Optimization with Noisy Data , 2015, Oper. Res..

[21]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[22]  David Pisinger,et al.  Where are the hard knapsack problems? , 2005, Comput. Oper. Res..

[23]  Dimitris Bertsimas,et al.  Pricing from Observational Data , 2016 .

[24]  Gianni Ferretti,et al.  Generation of human walking paths , 2013, IROS.

[25]  Ioannis C. Konstantakopoulos,et al.  Smart building energy efficiency via social game: a robust utility learning framework for closing–the–loop , 2016, 2016 1st International Workshop on Science of Smart City Operations and Platforms Engineering (SCOPE) in partnership with Global City Teams Challenge (GCTC) (SCOPE - GCTC).

[26]  Éva Tardos,et al.  Fast Approximation Algorithms for Fractional Packing and Covering Problems , 1995, Math. Oper. Res..

[27]  Henrik Ohlsson,et al.  Incentive Design and Utility Learning via Energy Disaggregation , 2013, 1312.1394.

[28]  Ravindra K. Ahuja,et al.  Solving Inverse Spanning Tree Problems Through Network Flow Techniques , 1999, Oper. Res..

[29]  Shahin Shahrampour,et al.  Online Optimization : Competing with Dynamic Comparators , 2015, AISTATS.

[30]  Y. Freund,et al.  Adaptive game playing using multiplicative weights , 1999 .

[31]  T. Chan,et al.  Goodness of Fit in Inverse Optimizaiton , 2015 .

[32]  Nacim Ramdani,et al.  Towards solving inverse optimal control in a bounded-error framework , 2015, 2015 American Control Conference (ACC).

[33]  Stephen P. Boyd,et al.  Imputing a convex objective function , 2011, 2011 IEEE International Symposium on Intelligent Control.

[34]  Alexandre M. Bayen,et al.  Imputing a variational inequality function or a convex objective function: A robust approach , 2018 .

[35]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[36]  David Simchi-Levi,et al.  OM Forum - OM Research: From Problem-Driven to Data-Driven Research , 2014, Manuf. Serv. Oper. Manag..

[37]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[38]  Philippe L. Toint,et al.  On an instance of the inverse shortest paths problem , 1992, Math. Program..

[39]  Tristan Perez,et al.  Discrete-time inverse optimal control with partial-state information: A soft-optimality approach with constrained state estimation , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[40]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[41]  Melanie Nicole Zeilinger,et al.  Utility learning model predictive control for personal electric loads , 2014, 53rd IEEE Conference on Decision and Control.

[42]  Luis Montesano,et al.  On the Performance of Maximum Likelihood Inverse Reinforcement Learning , 2012, ArXiv.

[43]  Robert E. Schapire,et al.  A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[44]  P. Toint,et al.  The inverse shortest paths problem with upper bounds on shortest paths costs , 1997 .

[45]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..