Learning to Optimize

We consider decision-making by boundedly-rational agents in dynamic stochastic environments. The behavioral primitive is anchored to the shadow price of the state vector. Our agent forecasts the value of an additional unit of the state tomorrow using estimated models of shadow prices and transition dynamics, and uses this forecast to choose her control today. The control decision, together with the agent’s forecast of tomorrow’s shadow price, are then used to update the perceived shadow price of today’s states. By following this boundedlyoptimal procedure the agent’s decision rule converges over time to the optimal policy. Specifically, within standard linear-quadratic environments, we obtain general conditions for asymptotically optimal decision-making: agents learn to optimize. Our results carry over to closely related procedures based on valuefunction learning and Euler-equation learning. We provide examples showing that shadow-price learning extends to general dynamic-stochastic decisionmaking environments and embeds naturally in general-equilibrium models. JEL Classifications: E52; E31; D83; D84

[1]  J. Muth Rational Expectations and the Theory of Price Movements , 1961 .

[2]  Huibert Kwakernaak,et al.  Linear Optimal Control Systems , 1972 .

[3]  R. Lucas Expectations and the neutrality of money , 1972 .

[4]  Thomas J. Sargent,et al.  Rational Expectations, the Real Rate of Interest, and the Natural Rate of Unemployment , 1973 .

[5]  A. Fuller,et al.  Stability of Motion , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  R. Lucas ASSET PRICES IN AN EXCHANGE ECONOMY , 1978 .

[7]  M. Bray Learning, estimation, and the stability of rational expectations , 1982 .

[8]  M. Bray,et al.  Rational Expectations Equilibria, Learning, and Model Specification , 1986 .

[9]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[10]  T. Sargent,et al.  Convergence of Least Squares Learning Mechanisms in Self- Referential Linear Stochastic Models* , 1989 .

[11]  S. Sunder,et al.  Indeterminacy of Equilibria in a Hyperinflationary World: Experimental Evidence , 1993 .

[12]  T. Sargent Bounded rationality in macroeconomics , 1993 .

[13]  Ramon Marimon,et al.  Expectations and learning under alternative monetary regimes: an experimental approach , 1994 .

[14]  A. Rustichini,et al.  RULES OF THUMB AND DYNAMICPROGRAMMING , 1995 .

[15]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[16]  Leiba Rodman,et al.  Algebraic Riccati equations , 1995 .

[17]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[18]  John Duffy,et al.  A model of learning and emulation with artificial adaptive agents , 1998 .

[19]  Seppo Honkapohja,et al.  Economic Dynamics with Learning: New Stability Results , 1998 .

[20]  T. Sargent The Conquest of American Inflation , 1999 .

[21]  T. Sargent,et al.  Escaping Nash Inflation , 2000, SSRN Electronic Journal.

[22]  G. Evans,et al.  Learning and expectations in macroeconomics , 2001 .

[23]  Tamer Basar,et al.  Analysis of Recursive Stochastic Algorithms , 2001 .

[24]  G. Evans,et al.  Monetary Policy, Expectations and Commitment , 2002, SSRN Electronic Journal.

[25]  James B. Bullard,et al.  Learning about monetary policy rules , 2002 .

[26]  G. Evans,et al.  Notes on Agents ’ Behavioral Rules Under Adaptive Learning and Recent Studies of Monetary Policy Seppo Honkapohja , 2003 .

[27]  Bruce Preston Learning About Monetary Policy Rules When Long-Horizon Expectations Matter , 2003 .

[28]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[29]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[30]  S. Sra,et al.  Matrix Differential Calculus , 2005 .

[31]  Universitext An Introduction to Ordinary Differential Equations , 2006 .

[32]  Stefano Eusepi,et al.  Central Bank Communication and Expectations Stabilization , 2007 .

[33]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .

[34]  T. Sargent,et al.  Anticipated Utility and Rational Expectations as Approximations of Bayesian Decision Making , 2008 .

[35]  P. Howitt,et al.  Adaptive Consumption Behavior , 2009 .

[36]  C. Hommes The heterogeneous expectations hypothesis: some evidence from the lab , 2010 .