论文信息 - Bayesian nonparametric approaches for reinforcement learning in partially observable domains

Bayesian nonparametric approaches for reinforcement learning in partially observable domains

Making intelligent decisions from incomplete information is critical in many applications: for example, medical decisions must often be made based on a few vital signs, without full knowledge of a patient's condition, and speech-based interfaces must infer a user's needs from noisy microphone inputs. What makes these tasks hard is that we do not even have a natural representation with which to model the task; we must learn about the task's properties while simultaneously performing the task. Learning a representation for a task also involves a trade-off between modeling the data that we have seen previously and being able to make predictions about new data streams. In this thesis, we explore one approach for learning representations of stochastic systems using Bayesian nonparametric statistics. Bayesian nonparametric methods allow the sophistication of a representation to scale gracefully with the complexity in the data. We show how the representations learned using Bayesian nonparametric methods result in better performance and interesting learned structure in three contexts related to reinforcement learning in partially-observable domains: learning partially observable Markov Decision processes, taking advantage of expert demonstrations, and learning complex hidden structures such as dynamic Bayesian networks. In each of these contexts, Bayesian nonparametric approach provide advantages in prediction quality and often computation time. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

Nicholas Roy | Finale Doshi-Velez | N. Roy | Finale Doshi-Velez | F. Doshi-Velez

[1] M. A. Girshick,et al. Theory of games and statistical decisions , 1955 .

[2] R. Bellman. Dynamic programming. , 1957, Science.

[3] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .

[4] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[5] T. Ferguson. A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[6] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8] Gary L. Drescher,et al. Made-up minds - a constructivist approach to artificial intelligence , 1991 .

[9] Andreas Stolcke,et al. Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[10] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[11] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[12] R. Kohn,et al. On Gibbs sampling for state space models , 1994 .

[13] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[14] Len Breslow. Greedy Utile Suffix Memory for Reinforcement Learning with Perceptually-Aliased States , 1996 .

[15] Zoubin Ghahramani,et al. Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[16] J. Pitman,et al. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[17] Eric A. Hansen,et al. An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.

[18] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[19] Stephen P. Brooks. Quantitative convergence assessment for Markov chain Monte Carlo via cusums , 1998, Stat. Comput..

[20] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.

[21] David A. McAllester,et al. Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.

[22] Jesse Hoey,et al. SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[23] Matthew Brand,et al. Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction , 1999, Neural Computation.

[24] Joelle Pineau,et al. Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[25] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[26] Colin de la Higuera,et al. Probabilistic DFA Inference using Kullback-Leibler Divergence and Minimality , 2000, ICML.

[27] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.

[28] Kee-Eung Kim,et al. Approximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers , 2000, AIPS.

[29] Ronald E. Parr,et al. Solving Factored POMDPs with Linear Value Functions , 2001 .

[30] Carl E. Rasmussen,et al. Factorial Hidden Markov Models , 1997 .

[31] Kevin P. Murphy,et al. The Factored Frontier Algorithm for Approximate Inference in DBNs , 2001, UAI.

[32] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[33] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .

[34] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .

[35] Craig Boutilier,et al. Value-Directed Compression of POMDPs , 2002, NIPS.

[36] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[37] Radford M. Neal. Slice Sampling , 2003, The Annals of Statistics.

[38] Azaria Paz,et al. Probabilistic automata , 2003 .

[39] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.

[40] Arthur Pchelkin,et al. Efficient Exploration in Reinforcement Learning Based on Utile Suffix Memory , 2003, Informatica.

[41] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[42] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.

[43] Michael I. Jordan,et al. Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[44] Guy Shani,et al. Resolving Perceptual Aliasing In The Presence Of Noisy Sensors , 2004, NIPS.

[45] Michael L. Littman,et al. Planning with predictive state representations , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[46] Joelle Pineau,et al. A Hierarchical Approach to POMDP Planning and Execution , 2004 .

[47] Sebastian Thrun,et al. Learning low dimensional predictive representations , 2004, ICML.

[48] Cosma Rohilla Shalizi,et al. Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences , 2004, UAI.

[49] Prashant Doshi,et al. Interactive POMDPs: properties and preliminary results , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[50] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[51] Joelle Pineau,et al. Active Learning in Partially Observable Markov Decision Processes , 2005, ECML.

[52] Pierre Dupont,et al. Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms , 2005, Pattern Recognit..

[53] Dr. Marcus Hutter,et al. Universal artificial intelligence , 2004 .

[54] J. S. Rao,et al. Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[55] Yishay Mansour,et al. Reinforcement Learning in POMDPs Without Resets , 2005, IJCAI.

[56] J.D. Williams,et al. Scaling up POMDPs for Dialog Management: The ``Summary POMDP'' Method , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[57] Tao Wang,et al. Bayesian sparse sampling for on-line reward optimization , 2005, ICML.

[58] Doina Precup,et al. Learning in non-stationary Partially Observable Markov Decision Processes , 2005 .

[59] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[60] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.

[61] Jesper Tegnér,et al. Learning dynamic Bayesian network models via cross-validation , 2005, Pattern Recognit. Lett..