论文信息 - Spatial and Temporal Abstractions in POMDPs Applied to Robot Navigation

Spatial and Temporal Abstractions in POMDPs Applied to Robot Navigation

Abstract : Partially observable Markov decision processes (POMDPs) are a well studied paradigm for programming autonomous robots, where the robot sequentially chooses actions to achieve long term goals efficiently. Unfortunately, for real world robots and other similar domains, the uncertain outcomes of the actions and the fact that the true world state may not be completely observable make learning of models of the world extremely difficult, and using them algorithmically infeasible. In this paper we show that learning POMDP models and planning with them can become signifcantly easier when we incorporate into our algorithms the notions of spatial and temporal abstraction. We demonstrate the superiority of our algorithms by comparing them with previous flat approaches for large scale robot navigation.

[1] Matthew Brand,et al. Structure Learning in Conditional Probability Models via an Entropic Prior and Parameter Extinction , 1999, Neural Computation.

[2] Richard Fikes,et al. On-line learning of predictive compositional hierarchies , 2002 .

[3] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[4] Abhijit Gosavi,et al. Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .

[5] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[6] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[7] Kevin P. Murphy,et al. Linear-time inference in Hierarchical HMMs , 2001, NIPS.

[8] Stephen E. Levinson,et al. Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[9] Andreas Stolcke,et al. Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[10] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[11] Sridhar Mahadevan,et al. Learning Hierarchical Partially Observable Markov Decision Process Models for Robot Navigation , 2001 .

[12] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[13] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14] Leslie Pack Kaelbling,et al. Learning models for robot navigation , 1999 .

[15] Monson H. Hayes,et al. An embedded HMM-based approach for face detection and recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[16] Eric A. Hansen,et al. An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[17] 李幼升,et al. Ph , 1989 .

[18] Svetha Venkatesh,et al. On the Recognition of Abstract Markov Policies , 2000, AAAI/IAAI.

[19] Roger K. Moore. Computer Speech and Language , 1986 .

[20] Maja J. Matarić,et al. A Distributed Model for Mobile Robot Environment-Learning and Navigation , 1990 .

[21] Earl D. Sacerdoti,et al. Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[22] John J. Leonard,et al. Consistent, Convergent, and Constant-Time SLAM , 2003, IJCAI.

[23] Vladimir Solmon,et al. The estimation of stochastic context-free grammars using the Inside-Outside algorithm , 2003 .

[24] Hugh F. Durrant-Whyte,et al. Simultaneous Localization and Mapping with Sparse Extended Information Filters , 2004, Int. J. Robotics Res..

[25] Mari Ostendorf,et al. From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[26] Detlef Prescher,et al. Inside-Outside Estimation Meets Dynamic EM , 2004, IWPT.

[27] Earl D. Sacerdoti,et al. The Nonlinear Nature of Plans , 1975, IJCAI.

[28] Sebastian Thrun,et al. Learning Metric-Topological Maps for Indoor Mobile Robot Navigation , 1998, Artif. Intell..

[29] Peter Norvig,et al. Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[30] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[31] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32] Leslie Pack Kaelbling,et al. Representing hierarchical POMDPs as DBNs for multi-scale robot localization , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[33] Carl de Marcken,et al. Unsupervised language acquisition , 1996, ArXiv.

[34] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[35] Leslie Pack Kaelbling,et al. Approximate Planning in POMDPs with Macro-Actions , 2003, NIPS.

[36] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[37] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[38] Yoram Singer,et al. The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[39] Joelle Pineau,et al. Policy-contingent abstraction for robust robot control , 2002, UAI.

[40] Nicholas Roy,et al. Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[41] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[42] Sridhar Mahadevan,et al. Hierarchical learning and planning in partially observable markov decision processes , 2002 .

[43] Mark A. Paskin,et al. Thin Junction Tree Filters for Simultaneous Localization and Mapping , 2002, IJCAI.

[44] L. R. Rabiner,et al. A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[45] Tom E. Bishop,et al. Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.