论文信息 - An Online-Learning Approach to Inverse Optimization

An Online-Learning Approach to Inverse Optimization

In this paper, we demonstrate how to learn the objective function of a decision-maker while only observing the problem input data and the decision-maker's corresponding decisions over multiple rounds. Our approach is based on online learning and works for linear objectives over arbitrary feasible sets for which we have a linear optimization oracle. As such, it generalizes previous approaches based on KKT-system decomposition and dualization. The two exact algorithms we present -- based on multiplicative weights updates and online gradient descent respectively -- converge at a rate of O(1/sqrt(T)) and thus allow taking decisions which are essentially as good as those of the observed decision-maker already after relatively few observations. We also discuss several useful generalizations, such as the approximate learning of non-linear objective functions and the case of suboptimal observations. Finally, we show the effectiveness and possible applications of our methods in a broad computational study.

[1] Nacim Ramdani,et al. Towards solving inverse optimal control in a bounded-error framework , 2015, 2015 American Control Conference (ACC).

[2] Timothy C. Y. Chan,et al. Inverse Optimization: Closed-Form Solutions, Geometry, and Goodness of Fit , 2015, Manag. Sci..

[3] Garud Iyengar,et al. Inverse conic programming with applications , 2005, Oper. Res. Lett..

[4] Dimitris Bertsimas,et al. Pricing from Observational Data , 2016 .

[5] Ioannis C. Konstantakopoulos,et al. Smart building energy efficiency via social game: a robust utility learning framework for closing–the–loop , 2016, 2016 1st International Workshop on Science of Smart City Operations and Platforms Engineering (SCOPE) in partnership with Global City Teams Challenge (GCTC) (SCOPE - GCTC).

[6] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[7] Luis Montesano,et al. On the Performance of Maximum Likelihood Inverse Reinforcement Learning , 2012, ArXiv.

[8] Dimitris Bertsimas,et al. The Power and Limits of Predictive Approaches to Observational-Data-Driven Optimization , 2016, 1605.02347.

[9] Maryam Kamgarpour,et al. Mixed Strategies for Robust Optimization of Unknown Objectives , 2020, AISTATS.

[10] Tristan Perez,et al. Discrete-time inverse optimal control with partial-state information: A soft-optimality approach with constrained state estimation , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[11] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .

[12] Gianni Ferretti,et al. Generation of human walking paths , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[14] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[15] Neal Master. Learning to Emulate an Expert Projective Cone Scheduler , 2019, 2019 American Control Conference (ACC).

[16] Philippe L. Toint,et al. On an instance of the inverse shortest paths problem , 1992, Math. Program..

[17] Kholekile L. Gwebu,et al. Some experiments on subjective optimisation , 2011 .

[18] Sebastian Pokutta,et al. Emulating the Expert: Inverse Optimization through Online Learning , 2017, ICML.

[19] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[20] Marvin D. Troutt,et al. Linear programming system identification , 2005, Eur. J. Oper. Res..

[21] Henrik Ohlsson,et al. Incentive Design and Utility Learning via Energy Disaggregation , 2013, 1312.1394.

[22] Ravindra K. Ahuja,et al. Solving Inverse Spanning Tree Problems Through Network Flow Techniques , 1999, Oper. Res..

[23] Daniel Kuhn,et al. Data-driven inverse optimization with imperfect information , 2015, Mathematical Programming.

[24] Madeleine Udell,et al. Dynamic Assortment Personalization in High Dimensions , 2016, Oper. Res..

[25] A. Belianin,et al. A Game-Theoretic Approach , 2001 .

[26] J. Li. Inverse Optimization of Convex Risk Functions , 2016, 1607.07099.

[27] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[28] Melanie Nicole Zeilinger,et al. Utility learning model predictive control for personal electric loads , 2014, 53rd IEEE Conference on Decision and Control.

[29] David Simchi-Levi,et al. OM Forum - OM Research: From Problem-Driven to Data-Driven Research , 2014, Manuf. Serv. Oper. Manag..

[30] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..

[31] Thomas D. Nielsen,et al. Learning a decision maker's utility function from (possibly) inconsistent behavior , 2004, Artif. Intell..

[32] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[33] Automatic Treatment Planning with Convex Imputing , 2014 .

[34] Mohsen Bayati,et al. Dynamic Pricing with Demand Covariates , 2016, 1604.07463.

[35] Ben Taskar,et al. Learning structured prediction models: a large margin approach , 2005, ICML.

[36] Edwin Roberts,et al. Transportation networks , 1977, 1977 IEEE Conference on Decision and Control including the 16th Symposium on Adaptive Processes and A Special Symposium on Fuzzy Set Theory and Applications.

[37] P. Toint,et al. The inverse shortest paths problem with upper bounds on shortest paths costs , 1997 .

[38] Sanjeev Arora,et al. The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[39] Parag A. Pathak,et al. Massachusetts Institute of Technology , 1964, Nature.

[40] Éva Tardos,et al. Fast approximation algorithms for fractional packing and covering problems , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[41] Yiran Chen,et al. Generalized Inverse Optimization through Online Learning , 2018, NeurIPS.

[42] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.

[43] Alexandre M. Bayen,et al. Imputing a variational inequality function or a convex objective function: A robust approach , 2018 .

[44] D. Simchi-Levi,et al. A Statistical Learning Approach to Personalization in Revenue Management , 2015, Manag. Sci..

[45] Samir Khuller,et al. On Correcting Inputs: Inverse Optimization for Online Structured Prediction , 2015, FSTTCS.

[46] Ravindra K. Ahuja,et al. A Faster Algorithm for the Inverse Spanning Tree Problem , 2000, J. Algorithms.

[47] Stephen P. Boyd,et al. Imputing a convex objective function , 2011, 2011 IEEE International Symposium on Intelligent Control.

[48] Marvin D. Troutt,et al. Behavioral Estimation of Mathematical Programming Objective Function Coefficients , 2006, Manag. Sci..

[49] Zuo-Jun Max Shen,et al. Inverse Optimization with Noisy Data , 2015, Oper. Res..

[50] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[51] David Pisinger,et al. Where are the hard knapsack problems? , 2005, Comput. Oper. Res..

[52] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[53] Dick den Hertog,et al. Bridging the gap between predictive and prescriptive analytics-new optimization methodology needed , 2016 .

[54] Marvin D. Troutt,et al. Linear programming system identification: The general nonnegative parameters case , 2008, Eur. J. Oper. Res..

[55] Shahin Shahrampour,et al. Online Optimization : Competing with Dynamic Comparators , 2015, AISTATS.

[56] Andrew J. Schaefer,et al. Inverse integer programming , 2009, Optim. Lett..

[57] D. Burtony. On the Use of an Inverse Shortest Paths Algorithm for Recovering Linearly Correlated Costs , 1997 .

[58] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[59] Madeleine Udell,et al. Learning Preferences from Assortment Choices in a Heterogeneous Population , 2015, ArXiv.

[60] Gábor Lugosi,et al. Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..