论文信息 - Targeting Specific Distributions of Trajectories in MDPs

Targeting Specific Distributions of Trajectories in MDPs

We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realizing a specified distribution of trajectories through the space. After motivating this formulation, we show how to convert a traditional MDP into a TTD-MDP. We derive an algorithm for finding non-deterministic policies by constructing a trajectory tree that allows us to compute locally-consistent policies. We specify the necessary conditions for solving the problem exactly and present a heuristic algorithm for constructing policies when an exact answer is impossible or impractical. We present empirical results for our algorithm in two domains: a synthetic grid world and stories in an interactive drama or game.

[1] Joseph Bates,et al. Virtual Reality, Art, and Entertainment , 1992, Presence: Teleoperators & Virtual Environments.

[2] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[3] Joseph Bates,et al. Guiding interactive drama , 1997 .

[4] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[5] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[6] Peter Stone,et al. A social reinforcement learning agent , 2001, AGENTS '01.

[7] Amos Storkey,et al. Advances in Neural Information Processing Systems 20 , 2007 .

[8] Michael Mateas,et al. Search-Based Drama Management in the Interactive Fiction Anchorhead , 2005, AIIDE.

[9] David L. Roberts,et al. Reinforcement learning for declarative optimization-based drama management , 2006, AAMAS '06.