论文信息 - A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes

A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes

Partially observable Markov decision processes (POMDPs) are interesting because they provide a general framework for learning in the presence of multiple forms of uncertainty. We survey methods for learning within the POMDP framework. Because exact methods are intractable we concentrate on approximate methods. We explore two versions of the POMDP training problem: learning when a model of the POMDP is known, and the much harder problem of learning when a model is not available. The methods used to solve POMDPs are sometimes referred to as reinforcement learning algorithms because the only feedback provided to the agent is a scalar reward signal at each time step.

D. Aberdeen | Douglas Aberdeen

[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[2] R. Bellman,et al. Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1963 .

[3] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[4] T. A. Bancroft,et al. Statistical Papers in Honor of George W. Snedecor. , 1972 .

[5] George W. Snedecor,et al. Statistical papers in honor of George W. Snedecor , 1973 .

[6] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7] Nils J. Nilsson,et al. Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Peter W. Glynn,et al. Proceedings of Ihe 1986 Winter Simulation , 2022 .

[9] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .

[10] Alan Weiss,et al. Sensitivity analysis via likelihood ratios , 1986, WSC '86.

[11] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[12] A. Poritz,et al. Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[13] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[14] Alan Weiss,et al. Sensitivity Analysis for Simulations via Likelihood Ratios , 1989, Oper. Res..

[15] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[16] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[17] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[18] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[19] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[20] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[21] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .

[22] Dana Ron,et al. The Power of Amnesia , 1993, NIPS.

[23] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[24] Enrico Macii,et al. Algebraic decision diagrams and their applications , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[25] Daniel S. Weld,et al. A Probablistic Model of Action for Least-Commitment Planning with Information Gathering , 1994, UAI.

[26] Yoshua Bengio,et al. An Input Output HMM Architecture , 1994, NIPS.

[27] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[28] Daw-Tung Lin,et al. The Adaptive Time-Delay Neural Network: Characterization and Applications to, Pattern Recognition, Prediction and Signal Processing , 1994 .

[29] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[30] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[31] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[32] Reid G. Simmons,et al. Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[33] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[34] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[35] Nevin L. Zhang. Efficient planning in stochastic domains through exploiting problem characteristics , 1995 .

[36] Illah R. Nourbakhsh,et al. DERVISH - An Office-Navigating Robot , 1995, AI Mag..