A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems
暂无分享,去创建一个
[1] P. Lanzi,et al. Adaptive Agents with Reinforcement Learning and Internal Memory , 2000 .
[2] Risto Miikkulainen,et al. Solving Non-Markovian Control Tasks with Neuro-Evolution , 1999, IJCAI.
[3] Michael R. James,et al. Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.
[4] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.
[5] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[6] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .
[7] Milos Hauskrecht,et al. Planning and control in stochastic domains with imperfect information , 1997 .
[8] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.
[9] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[10] Lei Zheng,et al. A memory-based reinforcement learning algorithm for partially observable Markovian decision processes , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .
[13] Bram Bakker,et al. Trading off perception with internal state: reinforcement learning and analysis of Q-Elman networks in a Markovian task , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.
[14] Reid G. Simmons,et al. Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.
[15] Andrew McCallum,et al. Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..
[16] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[17] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[18] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[19] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[20] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..
[21] R. Bellman. Dynamic programming. , 1957, Science.
[22] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..
[23] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[24] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[25] Andrew McCallum,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .
[26] William H. Press,et al. The Art of Scientific Computing Second Edition , 1998 .
[27] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.
[28] D. Cliff. From animals to animats 3 : proceedings of the Third International Conference on Simulation of Adaptive Behavior , 1994 .
[29] Bram Bakker,et al. Reinforcement Learning with LSTM in Non-Markovian Tasks with Long-Term Dependencies , 2001 .
[30] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[31] Neil D. Lawrence,et al. Advances in Neural Information Processing Systems 14 , 2002 .
[32] Guy Shani,et al. Model-Based Online Learning of POMDPs , 2005, ECML.
[33] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[34] Dario Floreano,et al. From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior , 2000, Journal of Cognitive Neuroscience.
[35] Finale Doshi-Velez,et al. The Infinite Partially Observable Markov Decision Process , 2009, NIPS.
[36] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[37] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.
[38] Maja J. Matarić,et al. Learning to Use Selective Attention and Short-Term Memory in Sequential Tasks , 1996 .
[39] Peter Stone,et al. Learning Predictive State Representations , 2003, ICML.