论文信息 - Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy

Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy

Predicting human behavior from a small amount of training examples is a challenging machine learning problem. In this thesis, we introduce the principle of maximum causal entropy, a general technique for applying information theory to decision-theoretic, game-theoretic, and control settings where relevant information is sequentially revealed over time. This approach guarantees decision-theoretic performance by matching purposeful measures of behavior (Abbeel & Ng, 2004), and/or enforces game-theoretic rationality constraints (Aumann, 1974), while otherwise being as uncertain as possible, which minimizes worst-case predictive log-loss (Grunwald & Dawid, 2003). We derive probabilistic models for decision, control, and multi-player game settings using this approach. We then develop corresponding algorithms for efficient inference that include relaxations of the Bellman equation (Bellman, 1957), and simple learning algorithms based on convex optimization. We apply the models and algorithms to a number of behavior prediction tasks. Specifically, we present empirical evaluations of the approach in the domains of vehicle route preference modeling using over 100,000 miles of collected taxi driving data, pedestrian motion modeling from weeks of indoor movement data, and robust prediction of game play in stochastic multi-player games.

J. Andrew Bagnell | Brian D. Ziebart | J. Bagnell

[1] N. Wiener,et al. Behavior, Purpose and Teleology , 1943, Philosophy of Science.

[2] Richard J. K. Taylor. Purposeful and Non-Purposeful Behavior: A Rejoinder , 1950, Philosophy of Science.

[3] Claude E. Shannon,et al. The zero error capacity of a noisy channel , 1956, IRE Trans. Inf. Theory.

[4] E. Jaynes. Information Theory and Statistical Mechanics , 1957 .

[5] R. Bellman. A Markovian Decision Process , 1957 .

[6] Edsger W. Dijkstra,et al. A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[7] R. Duncan Luce,et al. Individual Choice Behavior , 1959 .

[8] Claude E. Shannon,et al. Two-way Communication Channels , 1961 .

[9] Alvin W Drake,et al. Observation of a Markov process through a noisy channel , 1962 .

[10] R. E. Kalman,et al. When Is a Linear Control System Optimal , 1964 .

[11] Ronald A. Howard,et al. Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..

[12] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[13] C. G. Broyden. The Convergence of a Class of Double-rank Minimization Algorithms 1. General Considerations , 1970 .

[14] R. Fletcher,et al. A New Approach to Variable Metric Algorithms , 1970, Comput. J..

[15] D. Shanno. Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[16] D. McFadden. Conditional logit analysis of qualitative choice behavior , 1972 .

[17] H. Marko,et al. The Bidirectional Communication Theory - A Generalization of Information Theory , 1973, IEEE Transactions on Communications.

[18] R. Aumann. Subjectivity and Correlation in Randomized Strategies , 1974 .

[19] Moshe Ben-Akiva,et al. STRUCTURE OF PASSENGER TRAVEL DEMAND MODELS , 1974 .

[20] Jack K. Wolf,et al. The capacity region of a multiple-access discrete memoryless channel can increase with feedback (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[21] Ronald A. Howard,et al. Development of Automated Aids for Decision Analysis , 1976 .

[22] Centro de Pesquisas do Cacau Brazil. Technical report 1979. , 1980 .

[23] Robert J. Sternberg,et al. Handbook of human intelligence , 1984 .

[24] Ross D. Shachter. Evaluating Influence Diagrams , 1986, Oper. Res..

[25] Dean Pomerleau,et al. ALVINN, an autonomous land vehicle in a neural network , 2015 .

[26] E. T. Jaynes,et al. The Relation of Bayesian and Maximum Entropy Methods , 1988 .

[27] Eitan Zemel,et al. Nash and correlated equilibria: Some complexity considerations , 1989 .

[28] Don Coppersmith,et al. Matrix multiplication via arithmetic progressions , 1987, STOC.

[29] J. Massey. CAUSALITY, FEEDBACK AND DIRECTED INFORMATION , 1990 .

[30] M. Weiser. The Computer for the Twenty-First Century , 1991 .

[31] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[32] R. McKelvey,et al. An experimental study of the centipede game , 1992 .

[33] Ross D. Shachter,et al. Decision Making Using Probabilistic Inference Methods , 1992, UAI.

[34] David Heckerman,et al. Troubleshooting Under Uncertainty , 1994 .

[35] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[36] Manuela M. Veloso,et al. Planning and Learning by Analogical Reasoning , 1994, Lecture Notes in Computer Science.

[37] Yurii Nesterov,et al. Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[38] D. Stahl,et al. On Players' Models of Other Players: Theory and Experimental Evidence , 1995 .

[39] R. McKelvey,et al. Quantal Response Equilibria for Normal Form Games , 1995 .

[40] Masaki Hayashi,et al. On motion planning of mobile robots which coexist and cooperate with human , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[41] Eric Horvitz,et al. A Graph-Theoretic Analysis of Information Value , 1996, UAI.

[42] Aaron Steinfeld,et al. DESTINATION ENTRY AND RETRIEVAL WITH THE ALI-SCOUT NAVIGATION SYSTEM , 1996 .

[43] J. Filar,et al. Competitive Markov Decision Processes , 1996 .

[44] H. Kuk. On equilibrium points in bimatrix games , 1996 .

[45] S. Hart,et al. A simple adaptive procedure leading to correlated equilibrium , 2000 .

[46] Huaiyu Zhu. On Information and Sufficiency , 1997 .

[47] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[48] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[49] Gerhard Kramer,et al. Directed information for channels with feedback , 1998 .

[50] R. McKelvey,et al. Quantal Response Equilibria for Extensive Form Games , 1998 .

[51] Nevin Lianwen Zhang,et al. Probabilistic Inference in Influence Diagrams , 1998, Comput. Intell..

[52] E. Yaz. Linear Matrix Inequalities In System And Control Theory , 1998, Proceedings of the IEEE.

[53] Miguel A. Costa-Gomes,et al. Cognition and Behavior in Normal-Form Games: An Experimental Study , 1998 .

[54] Eric van Damme,et al. Non-Cooperative Games , 2000 .

[55] J. Pearl. Causality: Models, Reasoning and Inference , 2000 .

[56] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[57] David C. Hogg,et al. Learning Variable-Length Markov Models of Behavior , 2001, Comput. Vis. Image Underst..

[58] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[59] Panos E. Trahanias,et al. Predictive autonomous robot navigation , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[60] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .

[61] Wolfram Burgard,et al. Learning motion patterns of persons for mobile service robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[62] Svetha Venkatesh,et al. Policy Recognition in the Abstract Hidden Markov Model , 2002 .

[63] Rajmohan Madhavan,et al. Moving object prediction for off-road autonomous navigation , 2003, SPIE Defense + Commercial Sensing.

[64] Hagai Attias,et al. Planning by Probabilistic Inference , 2003, AISTATS.

[65] E. Jaynes. Probability theory : the logic of science , 2003 .

[66] Keith B. Hall,et al. Correlated Q-Learning , 2003, ICML.

[67] A 2D COLLISION WARNING FRAMEWORK BASED ON A MONTE CARLO APPROACH , 2004 .

[68] Kent Larson,et al. Activity Recognition in the Home Using Simple and Ubiquitous Sensors , 2004, Pervasive.

[69] Raj Madhavan,et al. A hierarchical, multi-resolutional moving object prediction approach for autonomous on-road driving , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[70] Sekhar Tatikonda,et al. Control under communication constraints , 2004, IEEE Transactions on Automatic Control.

[71] A. Dawid,et al. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[72] A. Moore,et al. Learning decisions: robustness, uncertainty, and approximation , 2004 .

[73] Gunnar Rätsch,et al. Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[74] Thomas D. Nielsen,et al. Learning a decision maker's utility function from (possibly) inconsistent behavior , 2004, Artif. Intell..

[75] Ling Bao,et al. Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[76] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[77] Thierry Fraichard,et al. Safe motion planning in dynamic environments , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[78] Anthony Stentz,et al. Field D*: An Interpolation-Based Path Planner and Replanner , 2005, ISRR.

[79] H. Kappen. Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[80] Tim Roughgarden,et al. Computing equilibria in multi-player games , 2005, SODA '05.

[81] Joshua B. Tenenbaum,et al. Bayesian models of human action understanding , 2005, NIPS.

[82] Andreas Krause,et al. Optimal Nonmyopic Value of Information in Graphical Models - Efficient Algorithms and Theoretical Limits , 2005, IJCAI.

[83] Wolfram Burgard,et al. Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[84] Laurent El Ghaoui,et al. Robust Control of Markov Decision Processes with Uncertain Transition Matrices , 2005, Oper. Res..

[85] Michael L. Littman,et al. Cyclic Equilibria in Markov Games , 2005, NIPS.

[86] Rajesh P. N. Rao,et al. Goal-Based Imitation as Probabilistic Inference over Graphical Models , 2005, NIPS.

[87] J. Andrew Bagnell,et al. Maximum margin planning , 2006, ICML.

[88] Brett Browning,et al. Learning to Predict Driver Route and Destination Intent , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[89] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[90] Eric Horvitz,et al. Trip Router with Individualized Preferences (TRIP): Incorporating Personalization into Route Planning , 2006, AAAI.

[91] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[92] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .

[93] Emanuel Todorov,et al. Linearly-solvable Markov decision problems , 2006, NIPS.

[94] Robert A. MacLachlan,et al. Tracking Moving Objects From a Moving Vehicle Using a Laser Scanner , 2006 .

[95] Mark W. Schmidt,et al. Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[96] David M. Bradley,et al. Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[97] Geoffrey J. Gordon,et al. Multi-Robot Negotiation: Approximating the Set of Subgame Perfect Equilibria in General-Sum Stochastic Games , 2006, NIPS.

[98] Eric Horvitz,et al. Predestination: Inferring Destinations from Partial Trajectories , 2006, UbiComp.

[99] Yang Zhang,et al. CarTel: a distributed mobile sensor computing system , 2006, SenSys '06.

[100] Henry A. Kautz,et al. Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields , 2007, Int. J. Robotics Res..

[101] H. Robbins. A Stochastic Approximation Method , 1951 .

[102] Ross D. Shachter. Advances in Decision Analysis: Model Building with Belief Networks and Influence Diagrams , 2007 .

[103] Henry A. Kautz,et al. Learning and inferring transportation routines , 2004, Artif. Intell..

[104] Luis E. Ortiz,et al. Maximum Entropy Correlated Equilibria , 2007, AISTATS.

[105] Manuela M. Veloso,et al. Conditional random fields for activity recognition , 2007, AAMAS '07.

[106] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[107] Chris L. Baker,et al. Goal Inference as Inverse Planning , 2007 .

[108] Christian Laugier,et al. Intentional motion on-line learning and prediction , 2008, Machine Vision and Applications.

[109] Csaba Szepesvári,et al. Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[110] Ariel Caticha,et al. Updating Probabilities with Data and Moments , 2007, ArXiv.

[111] Eyal Amir,et al. Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[112] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[113] Anind K. Dey,et al. Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior , 2008, UbiComp.

[114] Michael H. Bowling,et al. Apprenticeship learning using linear programming , 2008, ICML '08.

[115] Haim H. Permuter,et al. On directed information and gambling , 2008, 2008 IEEE International Symposium on Information Theory.

[116] Anind K. Dey,et al. Fast Planning for Dynamic Preferences , 2008, ICAPS.

[117] Emanuel Todorov,et al. Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[118] Siddhartha S. Srinivasa,et al. Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[119] Miroslav Dudík,et al. A Sampling-Based Approach to Computing Equilibria in Succinct Extensive-Form Games , 2009, UAI.

[120] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[121] Siddhartha S. Srinivasa,et al. Inverse Optimal Heuristic Control for Imitation Learning , 2009, AISTATS.

[122] Marc Toussaint,et al. Robot trajectory optimization using approximate inference , 2009, ICML '09.

[123] Emanuel Todorov,et al. Compositionality of optimal control laws , 2009, NIPS.

[124] Charles L. Isbell,et al. Solving Stochastic Games , 2009, NIPS.

[125] Anind K. Dey,et al. Maximum Causal Entropy Correlated Equilibria for Markov Games , 2011, Interactive Decision Theory and Game Theory.

[126] Emanuel Todorov,et al. Inverse Optimal Control with Linearly-Solvable MDPs , 2010, ICML.

[127] Kevin Leyton-Brown,et al. Beyond equilibrium: predicting human behaviour in normal form games , 2010, AAAI.

[128] Christian Vollmer,et al. Learning to navigate through crowded environments , 2010, 2010 IEEE International Conference on Robotics and Automation.

[129] Anind K. Dey,et al. Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[130] Daniel Polani,et al. Information Theory of Decisions and Actions , 2011 .

[131] Vicenç Gómez,et al. Optimal control as a graphical model inference problem , 2009, Machine Learning.