论文信息 - Authorial Idioms for Target Distributions in TTD-MDPs

Authorial Idioms for Target Distributions in TTD-MDPs

In designing Markov Decision Processes (MDP), one must define the world, its dynamics, a set of actions, and a reward function. MDPs are often applied in situations where there is a clear choice of reward functions and in these cases significant care must be taken to construct a reward function that induces the desired behavior. In this paper, we consider an analogous design problem: crafting a target distribution in Targeted Trajectory Distribution MDPs (TTD-MDPs). TTD-MDPs produce probabilistic policies that minimize divergence from a target distribution of trajectories from an underlying MDP. They are an extension of MDPs that provide variety of experience during repeated execution. Here, we present a brief overview of TTD-MDPs with approaches for constructing target distributions. Then we present a novel authorial idiom for creating target distributions using prototype trajectories. We evaluate these approaches on a drama manager for an interactive game.

David L. Roberts | Charles Lee Isbell | Sooraj Bhat | Kenneth St. Clair

[1] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[2] David L. Roberts,et al. Reinforcement learning for declarative optimization-based drama management , 2006, AAMAS '06.

[3] Michael Mateas,et al. Search-Based Drama Management in the Interactive Fiction Anchorhead , 2005, AIIDE.

[4] David L. Roberts,et al. A globally optimal algorithm for TTD-MDPs , 2007, AAMAS '07.

[5] Richard W. Hamming,et al. Error detecting and error correcting codes , 1950 .

[6] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[7] Joseph Bates,et al. Guiding interactive drama , 1997 .

[8] David L. Roberts,et al. Targeting Specific Distributions of Trajectories in MDPs , 2006, AAAI.

[9] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .