论文信息 - Finding Approximate POMDP solutions Through Belief Compression

Finding Approximate POMDP solutions Through Belief Compression

Recent research in the field of robotics has demonstrated the utility of probabilistic models for perception and state tracking on deployed robot systems. For example, Kalman filters and Markov localisation have been used successfully in many robot applications (Leonard & Durrant-Whyte, 1991; Thrun et al., 2000). There has also been considerable research into control and decision making algorithms that are robust in the face of specific kinds of uncertainty (Bagnell & Schneider, 2001). Few control algorithms, however, make use of full probabilistic representations throughout planning. As a consequence, robot control can become increasingly brittle as the system's perceptual uncertainty, and state uncertainty, increase. This thesis addresses the problem of decision making under uncertainty. In particular, we use a planning model called the partially observable Markov decision process, or POMDP (Sondik, 1971). The POMDP model computes a policy that maximises the expected future reward based on the complete probabilistic state estimate, or belief. Unfortunately, finding an optimal policy exactly is computationally demanding and thus infeasible for most problems that represent real world scenarios. This thesis describes a scalable approach to POMDP planning which uses low-dimensional representations of the belief space. We demonstrate how to make use of a variant of Principal Components Analysis (PCA) called Exponential family PCA (Collins et al., 2002) in order to compress certain kinds of large real-world POMDPs, and find policies for these problems. By finding low-dimensional representations of POMDPS, we are able to find good policies for problems that are orders of magnitude larger than those solvable by conventional approaches.

Geoffrey J. Gordon | S. Thrun | N. Roy

[1] A. Householder,et al. Discussion of a set of points in terms of their mutual distances , 1938 .

[2] Edsger W. Dijkstra,et al. A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[3] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[4] R. Bellman. Dynamic programming. , 1957, Science.

[5] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[6] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[7] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[8] H. Akaike. A new look at the statistical model identification , 1974 .

[9] 丸山徹. Convex Analysisの二,三の進展について , 1977 .

[10] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11] Dimitri P. Bertsekas,et al. Dynamic Programming and Stochastic Control , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[12] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[13] Lawrence R. Rabiner,et al. A tutorial on Hidden Markov Models , 1986 .

[14] Pravin Varaiya,et al. Stochastic Systems: Estimation, Identification, and Adaptive Control , 1986 .

[15] Marcel Schoppers,et al. Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.

[16] P. McCullagh,et al. Generalized Linear Models , 1992 .

[17] Hsien-Te Cheng,et al. Algorithms for partially observable markov decision processes , 1989 .

[18] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19] Ingemar J. Cox,et al. Autonomous Robot Vehicles , 1990, Springer New York.

[20] Sheryl R. YOUNG,et al. Use of dialogue, pragmatics and sematics to enhance speech recognition , 1990, Speech Commun..

[21] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[22] Jean-Claude Latombe,et al. Robot motion planning , 1991, The Kluwer international series in engineering and computer science.

[23] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[24] Hugh F. Durrant-Whyte,et al. Mobile robot localization by tracking geometric beacons , 1991, IEEE Trans. Robotics Autom..

[25] D. Moore. Simplicial Mesh Generation with Applications , 1992 .

[26] John A. Nelder,et al. Generalized linear models. 2nd ed. , 1993 .

[27] Jean-Claude Latombe,et al. Planning the Motions of a Mobile Robot in a Sensory Uncertainty Field , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[28] Gregory Dudek,et al. Precise positioning using model-based maps , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[29] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[30] Mark C. Torrance,et al. Natural communication with robots , 1994 .

[31] R. Simmons,et al. Probabilistic Navigation in Partially Observable Environments , 1995 .

[32] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[33] Stuart J. Russell,et al. Stochastic simulation algorithms for dynamic probabilistic networks , 1995, UAI.

[34] Teuvo Kohonen,et al. Self-Organizing Maps , 2010 .

[35] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[36] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning , 1995 .

[37] Reid G. Simmons,et al. Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[38] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[39] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[40] Illah R. Nourbakhsh,et al. DERVISH - An Office-Navigating Robot , 1995, AI Mag..

[41] Mosur Ravishankar,et al. Efficient Algorithms for Speech Recognition. , 1996 .

[42] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[43] Scott Davies,et al. Multidimensional Triangulation and Interpolation for Reinforcement Learning , 1996, NIPS.

[44] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .

[45] Wolfram Burgard,et al. Estimating the Absolute Position of a Mobile Robot Using Position Probability Grids , 1996, AAAI/IAAI, Vol. 2.

[46] Lydia E. Kavraki,et al. Probabilistic roadmaps for path planning in high-dimensional configuration spaces , 1996, IEEE Trans. Robotics Autom..

[47] Liqiang Feng,et al. Navigating Mobile Robots: Systems and Techniques , 1996 .

[48] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[49] Leslie Pack Kaelbling,et al. Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[50] Gregory Dudek,et al. Vision-based robot localization without explicit object models , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[51] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .

[52] Yasuhisa Niimi,et al. A dialog control strategy based on the reliability of speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[53] Alexander H. Waibel,et al. Dialogue strategies guiding users to their communicative goals , 1997, EUROSPEECH.

[54] Richard Washington,et al. BI-POMDP: Bounded, Incremental, Partially-Observable Markov-Model Planning , 1997, ECP.

[55] B. S. Manjunath,et al. An Eigenspace Update Algorithm for Image Analysis , 1997, CVGIP Graph. Model. Image Process..

[56] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[57] Ronen I. Brafman,et al. A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.

[58] Sam T. Roweis,et al. EM Algorithms for PCA and SPCA , 1997, NIPS.

[59] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[60] Wolfram Burgard,et al. A Probabilistic Approach to Concurrent Mapping and Localization for Mobile Robots , 1998, Auton. Robots.

[61] Wolfram Burgard,et al. Position Estimation for Mobile Robots in Dynamic Environments , 1998, AAAI/IAAI.

[62] Christopher M. Bishop,et al. GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[63] Wolfram Burgard,et al. An experimental comparison of localization methods , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[64] Wolfram Burgard,et al. Active Markov localization for mobile robots , 1998, Robotics Auton. Syst..

[65] Gregory Dudek,et al. Mobile robot localization from learned landmarks , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[66] Xavier Boyen,et al. Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[67] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.

[68] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[69] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[70] Hermann Ney,et al. Evaluating dialog systems used in the real world , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[71] Roberto Pieraccini,et al. Using Markov decision process for learning dialogue strategies , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[72] Alexander H. Waibel,et al. Towards spontaneous speech recognition for on-board car navigation and information systems , 1999, EUROSPEECH.

[73] Andrew W. Moore,et al. Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.

[74] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[75] Antal van den Bosch. Instance-Family Abstraction in Memory-Based Language Learning , 1999, ICML.

[76] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[77] Wolfram Burgard,et al. Coastal navigation-mobile robot navigation with uncertainty in dynamic environments , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[78] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[79] W. Burgard,et al. Markov Localization for Mobile Robots in Dynamic Environments , 1999, J. Artif. Intell. Res..

[80] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .

[81] Marilyn A. Walker,et al. Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.

[82] Shimei Pan,et al. Empirically Evaluating an Adaptable Spoken Dialogue System , 1999, ArXiv.

[83] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[84] Sebastian Thrun,et al. Coastal Navigation with Mobile Robots , 1999, NIPS.

[85] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[86] Sebastian Thrun,et al. Monte Carlo POMDPs , 1999, NIPS.

[87] Wolfram Burgard,et al. Experiences with an Interactive Museum Tour-Guide Robot , 1999, Artif. Intell..

[88] Kurt Konolige,et al. A gradient method for realtime robot control , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[89] Clark F. Olson,et al. Probabilistic self-localization for mobile robots , 2000, IEEE Trans. Robotics Autom..

[90] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[91] Wolfram Burgard,et al. Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot Minerva , 2000, Int. J. Robotics Res..

[92] Marilyn A. Walker,et al. Automatic Optimization of Dialogue Management , 2000, COLING.

[93] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[94] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[95] Alex Pentland,et al. EM for Perceptual Coding and Reinforcement Learning Tasks , 2000 .

[96] Ben M. Chen. Robust and H[∞] control , 2000 .

[97] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[98] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.

[99] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[100] Zhengzhu Feng,et al. Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.