Learning and Solving Partially Observable Markov Decision Processes
暂无分享,去创建一个
[1] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.
[2] Michael L. Littman,et al. Planning with predictive state representations , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..
[3] Michael R. James,et al. Learning and discovery of predictive state representations in dynamical systems with reset , 2004, ICML.
[4] Aude Billard,et al. From Animals to Animats , 2004 .
[5] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[6] Andrew McCallum,et al. Instance-Based State Identification for Reinforcement Learning , 1994, NIPS.
[7] Sebastian Thrun,et al. Learning low dimensional predictive representations , 2004, ICML.
[8] Jürgen Schmidhuber,et al. HQ-Learning , 1997, Adapt. Behav..
[9] Reid G. Simmons,et al. Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.
[10] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[11] Satinder P. Singh,et al. Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes , 1998, NIPS.
[12] Guy Shani,et al. Model-Based Online Learning of POMDPs , 2005, ECML.
[13] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[14] Hagit Shatkay,et al. Learning Hidden Markov Models with Geometrical Constraints , 1999, UAI.
[15] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[16] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..
[17] Guy Shani,et al. Prioritizing Point-Based POMDP Solvers , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[18] Kevin D. Seppi,et al. Prioritization Methods for Accelerating MDP Solvers , 2005, J. Mach. Learn. Res..
[19] Richard S. Sutton,et al. Predictive Representations of State , 2001, NIPS.
[20] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.
[21] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.
[22] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.
[23] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[24] Hector Geffner,et al. Solving Large POMDPs using Real Time Dynamic Programming , 1998 .
[25] Richard Washington,et al. BI-POMDP: Bounded, Incremental, Partially-Observable Markov-Model Planning , 1997, ECP.
[26] Jeff A. Bilmes,et al. A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .
[27] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.
[28] Xavier Boyen,et al. Tractable Inference for Complex Stochastic Processes , 1998, UAI.
[29] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.
[30] Tom M. Mitchell,et al. Reinforcement learning with hidden states , 1993 .
[31] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[32] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.
[33] Ronald E. Parr,et al. Solving Factored POMDPs with Linear Value Functions , 2001 .
[34] Brahim Chaib-draa,et al. An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.
[35] Craig Boutilier,et al. Value-Directed Compression of POMDPs , 2002, NIPS.
[36] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..
[37] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[38] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..
[39] Brahim Chaib-draa,et al. AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs , 2007, IJCAI.
[40] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.
[41] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .
[42] Eric A. Hansen,et al. An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.
[43] Doina Precup,et al. Belief Selection in Point-Based Planning Algorithms for POMDPs , 2006, Canadian Conference on AI.
[44] Enrico Macii,et al. Algebraic decision diagrams and their applications , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).
[45] Akira Hayashi,et al. Viewing Classifier Systems as Model Free Learning in POMDPs , 1998, NIPS.
[46] Guy Shani,et al. Resolving Perceptual Aliasing In The Presence Of Noisy Sensors , 2004, NIPS.
[47] Peter Stone,et al. Learning Predictive State Representations , 2003, ICML.
[48] Illah R. Nourbakhsh,et al. Learning Probabilistic Models for Decision-Theoretic Navigation of Mobile Robots , 2000, ICML.
[49] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[50] Craig Boutilier,et al. Stochastic Local Search for POMDP Controllers , 2004, AAAI.
[51] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[52] Daniel Nikovski,et al. State-aggregation algorithms for learning probabilistic models for robot control , 2002 .
[53] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.
[54] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..
[55] M. Littman,et al. Efficient dynamic-programming updates in partially observable Markov decision processes , 1995 .
[56] Akira Hayashi,et al. A Bayesian Approach to Model Learning in Non-Markovian Environments , 1997, ICML.
[57] Alexander S. Yeh,et al. More accurate tests for the statistical significance of result differences , 2000, COLING.
[58] S. E. Shimony,et al. Partial Observability Under Noisy Sensors — From Model-Free to Model-Based , 2005 .
[59] Joshua J. Estelle. Reinforcement Learning in POMDPs : Instance-Based State Identification vs . Fixed Memory Representations , 2003 .
[60] Michael L. Littman,et al. Memoryless policies: theoretical limitations and practical results , 1994 .
[62] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[63] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[64] Sergey V. Alexandrov,et al. Ratbert : Nearest Sequence Memory Based Prediction Model Applied to Robot Navigation , 2003 .
[65] N. Zhang,et al. Algorithms for partially observable markov decision processes , 2001 .
[66] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.
[67] D. Aberdeen,et al. A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes , 2003 .
[68] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.
[69] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[70] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[71] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .
[72] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[73] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
[74] P. Lanzi,et al. Adaptive Agents with Reinforcement Learning and Internal Memory , 2000 .
[75] Marco Wiering,et al. Utile distinction hidden Markov models , 2004, ICML.
[76] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[77] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[78] Dana H. Ballard,et al. Learning to perceive and act by trial and error , 1991, Machine Learning.
[79] Randal E. Bryant,et al. Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.
[80] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[81] Akira Hayashi,et al. A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory , 1998, NIPS.
[82] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[83] Sridhar Mahadevan,et al. Hierarchical Memory-Based Reinforcement Learning , 2000, NIPS.
[84] Zhengzhu Feng,et al. Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.
[85] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[86] Jesse Hoey,et al. Assisting persons with dementia during handwashing using a partially observable Markov decision process. , 2007, ICVS 2007.
[87] Jesse Hoey,et al. APRICODD: Approximate Policy Construction Using Decision Diagrams , 2000, NIPS.
[88] Stuart J. Russell,et al. Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.
[89] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .
[90] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[91] Ronen I. Brafman,et al. A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.
[92] Peter Dayan,et al. Q-learning , 1992, Machine Learning.