Factorized decision forecasting via combining value-based and reward-based estimation

A powerful recent perspective for predicting sequential decisions learns the parameters of decision problems that produce observed behavior as (near) optimal solutions. Under this perspective, behavior is explained in terms of utilities, which can often be defined as functions of state and action features to enable generalization across decision tasks. Two approaches have been proposed from this perspective: estimate a feature-based reward function and recursively compute values from it, or directly estimate a feature-based value function. In this work, we investigate the combination of these two approaches into a single learning task using directed information theory and the principle of maximum entropy. This enables uncovering which type of estimate is most appropriate — in terms of predictive accuracy and/or computational benefit — for different portions of the decision space.

[1]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[2]  H. Marko,et al.  The Bidirectional Communication Theory - A Generalization of Information Theory , 1973, IEEE Transactions on Communications.

[3]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[4]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[5]  Anind K. Dey,et al.  Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior , 2008, UbiComp.

[6]  Robert E. Schapire,et al.  A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[7]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[8]  Gerhard Kramer,et al.  Directed information for channels with feedback , 1998 .

[9]  Stephen P. Boyd,et al.  Linear Matrix Inequalities in Systems and Control Theory , 1994 .

[10]  A. Dawid,et al.  Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[11]  J. Massey CAUSALITY, FEEDBACK AND DIRECTED INFORMATION , 1990 .

[12]  Vladimir A. Yakubovich,et al.  Linear Matrix Inequalities in System and Control Theory (S. Boyd, L. E. Ghaoui, E. Feron, and V. Balakrishnan) , 1995, SIAM Rev..

[13]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[14]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[15]  R. Bellman A Markovian Decision Process , 1957 .

[16]  Anind K. Dey,et al.  Modeling Interaction via the Principle of Maximum Causal Entropy , 2010, ICML.

[17]  John Rust Maximum likelihood estimation of discrete control processes , 1988 .

[18]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[19]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[20]  R. E. Kalman,et al.  When Is a Linear Control System Optimal , 1964 .

[21]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[22]  Csaba Szepesvári,et al.  Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods , 2007, UAI.

[23]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[25]  David M. Bradley,et al.  Boosting Structured Prediction for Imitation Learning , 2006, NIPS.

[26]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[27]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[28]  Chris L. Baker,et al.  Goal Inference as Inverse Planning , 2007 .

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  E. Yaz Linear Matrix Inequalities In System And Control Theory , 1998, Proceedings of the IEEE.

[31]  Emanuel Todorov,et al.  Inverse Optimal Control with Linearly-Solvable MDPs , 2010, ICML.

[32]  Daphne Koller,et al.  Learning an Agent's Utility Function by Observing Behavior , 2001, ICML.