A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] R. Bellman,et al. Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1963 .
[3] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .
[4] T. A. Bancroft,et al. Statistical Papers in Honor of George W. Snedecor. , 1972 .
[5] George W. Snedecor,et al. Statistical papers in honor of George W. Snedecor , 1973 .
[6] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[7] Nils J. Nilsson,et al. Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[8] Peter W. Glynn,et al. Proceedings of Ihe 1986 Winter Simulation , 2022 .
[9] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[10] Alan Weiss,et al. Sensitivity analysis via likelihood ratios , 1986, WSC '86.
[11] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[12] A. Poritz,et al. Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[13] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[14] Alan Weiss,et al. Sensitivity Analysis for Simulations via Likelihood Ratios , 1989, Oper. Res..
[15] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.
[16] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[17] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[18] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[19] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.
[20] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[21] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .
[22] Dana Ron,et al. The Power of Amnesia , 1993, NIPS.
[23] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[24] Enrico Macii,et al. Algebraic decision diagrams and their applications , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).
[25] Daniel S. Weld,et al. A Probablistic Model of Action for Least-Commitment Planning with Information Gathering , 1994, UAI.
[26] Yoshua Bengio,et al. An Input Output HMM Architecture , 1994, NIPS.
[27] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[28] Daw-Tung Lin,et al. The Adaptive Time-Delay Neural Network: Characterization and Applications to, Pattern Recognition, Prediction and Signal Processing , 1994 .
[29] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[30] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[31] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[32] Reid G. Simmons,et al. Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.
[33] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[34] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[35] Nevin L. Zhang. Efficient planning in stochastic domains through exploiting problem characteristics , 1995 .
[36] Illah R. Nourbakhsh,et al. DERVISH - An Office-Navigating Robot , 1995, AI Mag..
[37] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[38] Wenju Liu,et al. Planning in Stochastic Domains: Problem Characteristics and Approximation , 1996 .
[39] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[40] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.
[41] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[42] Corso Elvezia. Probabilistic Incremental Program Evolution , 1997 .
[43] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[44] Richard Washington,et al. BI-POMDP: Bounded, Incremental, Partially-Observable Markov-Model Planning , 1997, ECP.
[45] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.
[46] Milos Hauskrecht,et al. Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes , 1997, AAAI/IAAI.
[47] Ronen I. Brafman,et al. A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.
[48] Shigenobu Kobayashi,et al. Reinforcement learning for continuous action using stochastic gradient ascent , 1998 .
[49] Akira Hayashi,et al. A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory , 1998, NIPS.
[50] Mark D. Pendrith,et al. An Analysis of Direct Reinforcement Learning in Non-Markovian Domains , 1998, ICML.
[51] Xavier Boyen,et al. Tractable Inference for Complex Stochastic Processes , 1998, UAI.
[52] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.
[53] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[54] Satinder P. Singh,et al. Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes , 1998, NIPS.
[55] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[56] Balaraman Ravindran,et al. Improved Switching among Temporally Abstract Actions , 1998, NIPS.
[57] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.
[58] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.
[59] Brian Sallans,et al. Learning Factored Representations for Partially Observable Markov Decision Processes , 1999, NIPS.
[60] Jim Blythe,et al. Decision-Theoretic Planning , 1999, AI Mag..
[61] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[62] David A. McAllester,et al. Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.
[63] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[64] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..
[65] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[66] Daphne Koller,et al. Reinforcement Learning Using Approximate Belief States , 1999, NIPS.
[67] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[68] Sebastian Thrun,et al. Monte Carlo POMDPs , 1999, NIPS.
[69] Thomas G. Dietterich. An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.
[70] Craig Boutilier,et al. Value-Directed Belief State Approximation for POMDPs , 2000, UAI.
[71] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[72] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..
[73] Thomas G. Dietterich,et al. A POMDP Approximation Algorithm That Anticipates the Need to Observe , 2000, PRICAI.
[74] Sridhar Mahadevan,et al. Hierarchical Memory-Based Reinforcement Learning , 2000, NIPS.
[75] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[76] Leslie Pack Kaelbling,et al. Adaptive Importance Sampling for Estimation in Structured Domains , 2000, UAI.
[77] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[78] Judy Goldsmith,et al. Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.
[79] Alain Dutech,et al. Solving POMDPs Using Selected Past Events , 2000, ECAI.
[80] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[81] Kee-Eung Kim,et al. Approximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers , 2000, AIPS.
[82] P. Lanzi,et al. Adaptive Agents with Reinforcement Learning and Internal Memory , 2000 .
[83] J. Tsitsiklis,et al. Gradient-Based Optimization of Markov Reward Processes: Practical Variants , 2000 .
[84] Zhengzhu Feng,et al. Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.
[85] Katia P. Sycara,et al. Evolutionary Search, Stochastic Policies with Memory, and Reinforcement Learning with Hidden State , 2001, ICML.
[86] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[87] Weihong Zhang,et al. Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..
[88] Shie Mannor,et al. Learning Embedded Maps of Markov Processes , 2001, ICML.
[89] Sebastian Thrun,et al. Integrating value functions and policy search for continuous Markov Decision Processes , 2001, NIPS 2001.
[90] Craig Boutilier,et al. Value-directed sampling methods for monitoring POMDPs , 2001, UAI 2001.
[91] Craig Boutilier,et al. Vector-space Analysis of Belief-state Approximation for POMDPs , 2001, UAI.
[92] Ronald E. Parr,et al. Solving Factored POMDPs with Linear Value Functions , 2001 .
[93] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[94] Jürgen Schmidhuber,et al. Market-Based Reinforcement Learning in Partially Observable Worlds , 2001, ICANN.
[95] Lex Weaver,et al. A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.
[96] Olivier Buffet,et al. Multi-Agent Systems by Incremental Gradient Reinforcement Learning , 2001, IJCAI.
[97] Andrew W. Moore,et al. Direct Policy Search using Paired Statistical Tests , 2001, ICML.
[98] Sridhar Mahadevan,et al. Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.
[99] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[100] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.
[101] Bram Bakker,et al. Reinforcement Learning with LSTM in Non-Markovian Tasks with Long-Term Dependencies , 2001 .
[102] Christian R. Shelton,et al. Importance sampling for reinforcement learning with multiple objectives , 2001 .
[103] Gerald DeJong,et al. Reinforcement Learning and Shaping: Encouraging Intended Behaviors , 2002, ICML.
[104] Peter L. Bartlett,et al. Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning , 2000, J. Comput. Syst. Sci..
[105] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[106] Douglas Aberdeen,et al. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .
[107] Peter L. Bartlett,et al. Model Selection and Error Estimation , 2000, Machine Learning.
[108] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[109] Sridhar Mahadevan,et al. Hierarchical Multiagent Reinforcement Learning , 2004 .
[110] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[111] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[112] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.