Computational Rationalization: The Inverse Equilibrium Problem

Modeling the purposeful behavior of imperfect agents from a small number of observations is a challenging task. When restricted to the single-agent decision-theoretic setting, inverse optimal control techniques assume that observed behavior is an approximately optimal solution to an unknown decision problem. These techniques learn a utility function that explains the example behavior and can then be used to accurately predict or imitate future behavior in similar observed or unobserved situations. In this work, we consider similar tasks in competitive and cooperative multi-agent domains. Here, unlike single-agent settings, a player cannot myopically maximize its reward; it must speculate on how the other agents may act to influence the game's outcome. Employing the game-theoretic notion of regret and the principle of maximum entropy, we introduce a technique for predicting and generalizing behavior.

[1]  James A. Landay,et al.  Personalizing routes , 2006, UIST.

[2]  Rene Mayrhofer,et al.  An architecture for context prediction , 2004 .

[3]  Aviv Nevo Measuring Market Power in the Ready-to-Eat Cereal Industry , 1998 .

[4]  Anind K. Dey,et al.  Probabilistic pointing target prediction via inverse optimal control , 2012, IUI '12.

[5]  Junichi Suzuki,et al.  Land Use Regulation as a Barrier to Entry: Evidence from the Texas Lodging Industry , 2012 .

[6]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7]  Andreas Krause,et al.  Robust, low-cost, non-intrusive sensing and recognition of seated postures , 2007, UIST.

[8]  Bo Egardt,et al.  Assessing the Potential of Predictive Control for Hybrid Vehicle Powertrains Using Stochastic Dynamic Programming , 2005, IEEE Transactions on Intelligent Transportation Systems.

[9]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[10]  David Silver,et al.  Learning from Demonstration for Autonomous Navigation in Complex Unstructured Terrain , 2010, Int. J. Robotics Res..

[11]  Joshua B. Tenenbaum,et al.  Help or Hinder: Bayesian Models of Social Goal Inference , 2009, NIPS.

[12]  David Silver,et al.  Learning to search: Functional gradient techniques for imitation learning , 2009, Auton. Robots.

[13]  Christopher G. Atkeson,et al.  Predicting human interruptibility with sensors , 2005, TCHI.

[14]  Brett Browning,et al.  Learning to Predict Driver Route and Destination Intent , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[15]  A. Dawid,et al.  Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[16]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[17]  Vrinda Kadiyali,et al.  Entry-Deterring Capacity in the Texas Lodging Industry , 2006 .

[18]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[19]  Alvin E. Roth,et al.  A Choice Prediction Competition for Market Entry Games: An Introduction , 2010, Games.

[20]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[21]  Kent Larson,et al.  Activity Recognition in the Home Using Simple and Ubiquitous Sensors , 2004, Pervasive.

[22]  Miroslav Dudík,et al.  Maximum Entropy Density Estimation with Generalized Regularization and an Application to Species Distribution Modeling , 2007, J. Mach. Learn. Res..

[23]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[24]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[25]  Sekhar Tatikonda,et al.  Control under communication constraints , 2004, IEEE Transactions on Automatic Control.

[26]  J. L. Myers,et al.  Effects of range of payoffs as a variable in risk taking. , 1960, Journal of experimental psychology.

[27]  Colin Camerer,et al.  Experience‐weighted Attraction Learning in Normal Form Games , 1999 .

[28]  R. Bellman A Markovian Decision Process , 1957 .

[29]  David Heckerman,et al.  Troubleshooting Under Uncertainty , 1994 .

[30]  E. Yaz Linear Matrix Inequalities In System And Control Theory , 1998, Proceedings of the IEEE.

[31]  Sajal K. Das,et al.  LeZi-Update: An Information-Theoretic Framework for Personal Mobility Tracking in PCS Networks , 2002, Wirel. Networks.

[32]  D. McFadden MEASUREMENT OF URBAN TRAVEL DEMAND , 1974 .

[33]  Y. Mansour,et al.  Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .

[34]  Steven T. Berry,et al.  Automobile Prices in Market Equilibrium , 1995 .

[35]  Luis E. Ortiz,et al.  Maximum Entropy Correlated Equilibria , 2007, AISTATS.

[36]  J. Massey CAUSALITY, FEEDBACK AND DIRECTED INFORMATION , 1990 .

[37]  I. Erev,et al.  On adaptation, maximization, and reinforcement learning among cognitive strategies. , 2005, Psychological review.

[38]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[39]  Anind K. Dey,et al.  Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[40]  Gerhard Kramer,et al.  Directed information for channels with feedback , 1998 .

[41]  R. McKelvey,et al.  Quantal Response Equilibria for Normal Form Games , 1995 .

[42]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[43]  John Langford,et al.  Correlated equilibria in graphical games , 2003, EC '03.

[44]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[45]  Henry A. Kautz,et al.  Learning and inferring transportation routines , 2004, Artif. Intell..

[46]  Harry Zhang,et al.  Traffic Advisories Based on Route Prediction , 2007 .

[47]  Haim H. Permuter,et al.  On directed information and gambling , 2008, 2008 IEEE International Symposium on Information Theory.

[48]  Ronald A. Howard,et al.  Influence Diagrams , 2005, Decis. Anal..

[49]  Eric Horvitz,et al.  Trip Router with Individualized Preferences (TRIP): Incorporating Personalization into Route Planning , 2006, AAAI.

[50]  Aaron Steinfeld,et al.  DESTINATION ENTRY AND RETRIEVAL WITH THE ALI-SCOUT NAVIGATION SYSTEM , 1996 .

[51]  Zhou Yang Correlated Equilibrium and the Estimation of Discrete Games of Complete Information , 2009 .

[52]  Matthai Philipose,et al.  Unsupervised Activity Recognition Using Automatically Mined Common Sense , 2005, AAAI.

[53]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[54]  Michael C. Mozer,et al.  The Neural Network House: An Environment that Adapts to its Inhabitants , 1998 .

[55]  D. Hennessy,et al.  Traffic congestion, driver stress, and driver aggression , 1999 .

[56]  Henry A. Kautz,et al.  Inferring High-Level Behavior from Low-Level Sensors , 2003, UbiComp.

[57]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  Anind K. Dey,et al.  Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior , 2008, UbiComp.

[59]  T. D. Parsons,et al.  Pursuit-evasion in a graph , 1978 .

[60]  R. E. Kalman,et al.  When Is a Linear Control System Optimal , 1964 .

[61]  Mike Y. Chen,et al.  Tracking Free-Weight Exercises , 2007, UbiComp.

[62]  Christian Vollmer,et al.  Learning to navigate through crowded environments , 2010, 2010 IEEE International Conference on Robotics and Automation.

[63]  E. Yechiam,et al.  Loss aversion, diminishing sensitivity, and the effect of experience on repeated decisions† , 2008 .

[64]  Amil Petrin Quantifying the Benefits of New Products: The Case of the Minivan , 2001, Journal of Political Economy.

[65]  S. Hart,et al.  A simple adaptive procedure leading to correlated equilibrium , 2000 .

[66]  Yang Zhang,et al.  CarTel: a distributed mobile sensor computing system , 2006, SenSys '06.

[67]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[68]  Eric Horvitz,et al.  Predestination: Inferring Destinations from Partial Trajectories , 2006, UbiComp.

[69]  Miroslav Dudík,et al.  Maximum Entropy Distribution Estimation with Generalized Regularization , 2006, COLT.

[70]  Andreas Butz,et al.  Location-Aware Shopping Assistance: Evaluation of a Decision-Theoretic Approach , 2002, Mobile HCI.

[71]  R. Vohra,et al.  Calibrated Learning and Correlated Equilibrium , 1996 .