Watch and learn: optimizing from revealed preferences feedback

A Stackelberg game is played between a leader and a follower. The leader first chooses an action, and then the follower plays his best response, and the goal of the leader is to pick the action that will maximize his payoff given the follower's best response. Stackelberg games capture, for example, the following interaction between a retailer and a buyer. The retailer chooses the prices of the goods he produces, and then the buyer chooses to buy a utility-maximizing bundle of goods. The goal of the retailer here is to set prices to maximize his profit---his revenue minus the production cost of the purchased bundle. It is quite natural that the retailer in this example would not know the buyer's utility function. However, he does have access to revealed preference feedback---he can set prices, and then observe the purchased bundle and his own profit. We give algorithms for efficiently solving, in terms of both computational and query complexity, a broad class of Stackelberg games in which the follower's utility function is unknown, using only "revealed preference" access to it. This class includes the profit maximization problem, as well as the optimal tolling problem in nonatomic congestion games, when the latency functions are unknown. Surprisingly, we are able to solve these problems even though the corresponding maximization problems are not concave in the leader's actions.

[1]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[2]  Morteza Zadimoghaddam,et al.  Efficiently Learning from Revealed Preference , 2012, WINE.

[3]  Robert D. Kleinberg,et al.  Dynamic Pricing with Limited Supply (extended abstract) , 2012 .

[4]  Santosh S. Vempala,et al.  Simulated annealing in convex bodies and an O*(n4) volume algorithm , 2006, J. Comput. Syst. Sci..

[5]  Ariel D. Procaccia,et al.  Learning Optimal Commitment to Overcome Insecurity , 2014, NIPS.

[6]  Ilya Segal,et al.  Solutions manual for Microeconomic theory : Mas-Colell, Whinston and Green , 1997 .

[7]  Noam Nisan,et al.  Sampling and Representation Complexity of Revenue Maximization , 2014, WINE.

[8]  Tim Roughgarden,et al.  Making the Most of Your Samples , 2014, EC.

[9]  L. Shapley,et al.  Potential Games , 1994 .

[10]  H. Varian Revealed Preference , 2006 .

[11]  Hariharan Narayanan,et al.  Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions , 2015, COLT.

[12]  Richard Cole,et al.  The sample complexity of revenue maximization , 2014, STOC.

[13]  M. Sion On general minimax theorems , 1958 .

[14]  Patrice Marcotte,et al.  Bilevel programming: A survey , 2005, 4OR.

[15]  Vincent Conitzer,et al.  Learning and Approximating the Optimal Strategy to Commit To , 2009, SAGT.

[16]  Noam Nisan,et al.  On the Computational Power of Demand Queries , 2009, SIAM J. Comput..

[17]  L. Shapley,et al.  REGULAR ARTICLEPotential Games , 1996 .

[18]  Aaron Roth,et al.  Online Learning and Profit Maximization from Revealed Preferences , 2014, AAAI.

[19]  Shuchi Chawla,et al.  Mechanism design for data science , 2014, EC.

[20]  Moshe Babaioff,et al.  Dynamic Pricing with Limited Supply , 2011, ACM Trans. Economics and Comput..

[21]  Chaitanya Swamy,et al.  Achieving Target Equilibria in Network Routing Games without Knowing the Latency Functions , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[22]  Tim Roughgarden,et al.  The Pseudo-Dimension of Near-Optimal Auctions , 2015, NIPS 2015.

[23]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[24]  P. Samuelson A Note on the Pure Theory of Consumer's Behaviour , 1938 .

[25]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[26]  Yishay Mansour,et al.  Learning Valuation Distributions from Partial Observation , 2014, AAAI.

[27]  Vincent Conitzer,et al.  Complexity of Computing Optimal Stackelberg Strategies in Security Resource Allocation Games , 2010, AAAI.

[28]  Yishay Mansour,et al.  Learning valuation distributions from partial observations , 2015, AAAI 2015.

[29]  Sébastien Bubeck,et al.  Theory of Convex Optimization for Machine Learning , 2014, ArXiv.

[30]  A. Mas-Colell,et al.  Microeconomic Theory , 1995 .

[31]  Ariel Rubinstein,et al.  Lecture Notes in Microeconomic Theory: The Economic Agent - Second Edition , 2006 .

[32]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[33]  Maria-Florina Balcan,et al.  Reducing mechanism design to algorithm design via machine learning , 2007, J. Comput. Syst. Sci..

[34]  Rakesh V. Vohra,et al.  Learning from revealed preference , 2006, EC '06.

[35]  Maria-Florina Balcan,et al.  Learning Economic Parameters from Revealed Preferences , 2014, WINE.