论文信息 - Factorized decision forecasting via combining value-based and reward-based estimation

Factorized decision forecasting via combining value-based and reward-based estimation

A powerful recent perspective for predicting sequential decisions learns the parameters of decision problems that produce observed behavior as (near) optimal solutions. Under this perspective, behavior is explained in terms of utilities, which can often be defined as functions of state and action features to enable generalization across decision tasks. Two approaches have been proposed from this perspective: estimate a feature-based reward function and recursively compute values from it, or directly estimate a feature-based value function. In this work, we investigate the combination of these two approaches into a single learning task using directed information theory and the principle of maximum entropy. This enables uncovering which type of estimate is most appropriate — in terms of predictive accuracy and/or computational benefit — for different portions of the decision space.

Brian D. Ziebart

[1] Anind K. Dey,et al. Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[2] H. Marko,et al. The Bidirectional Communication Theory - A Generalization of Information Theory , 1973, IEEE Transactions on Communications.

[3] David Maxwell Chickering,et al. Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[4] Nils J. Nilsson,et al. A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[5] Anind K. Dey,et al. Navigate like a cabbie: probabilistic reasoning from observed context-aware behavior , 2008, UbiComp.

[6] Robert E. Schapire,et al. A Game-Theoretic Approach to Apprenticeship Learning , 2007, NIPS.

[7] Peter Green,et al. Markov chain Monte Carlo in Practice , 1996 .

[8] Gerhard Kramer,et al. Directed information for channels with feedback , 1998 .

[9] Stephen P. Boyd,et al. Linear Matrix Inequalities in Systems and Control Theory , 1994 .

[10] A. Dawid,et al. Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[11] J. Massey. CAUSALITY, FEEDBACK AND DIRECTED INFORMATION , 1990 .