Algorithms for partially observable markov decision processes
暂无分享,去创建一个
[1] S. Vajda,et al. GAMES AND DECISIONS; INTRODUCTION AND CRITICAL SURVEY. , 1958 .
[2] Alvin W Drake,et al. Observation of a Markov process through a noisy channel , 1962 .
[3] Cyrus Derman,et al. Finite State Markovian Decision Processes , 1970 .
[4] Leon S. Lasdon,et al. Optimization Theory of Large Systems , 1970 .
[5] H. Kushner,et al. Mathematical programming and the control of Markov chains , 1971 .
[6] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.
[7] H. Kushner,et al. Decomposition of systems governed by Markov chains , 1974 .
[8] Chelsea C. White,et al. Optimal Diagnostic Questionnaires Which Allow Less than Truthful Responses , 1976, Inf. Control..
[9] P. Varaiya,et al. Multilayer control of large Markov chains , 1978 .
[10] C. White. Optimal Inspection and Repair of a Production Process Subject to Deterioration , 1978 .
[11] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[12] James N. Eagle. The Optimal Search for a Moving Target When the Search Path Is Constrained , 1984, Oper. Res..
[13] Richard E. Korf,et al. Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..
[14] Nils J. Nilsson,et al. Probabilistic Logic * , 2022 .
[15] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[16] Eric Horvitz,et al. Decision theory in expert systems and artificial intelligenc , 1988, Int. J. Approx. Reason..
[17] Chelsea C. White,et al. Solution Procedures for Partially Observed Markov Decision Processes , 1989, Oper. Res..
[18] C. Watkins. Learning from delayed rewards , 1989 .
[19] Hsien-Te Cheng,et al. Algorithms for partially observable markov decision processes , 1989 .
[20] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..
[21] John L. Bresina,et al. Anytime Synthetic Projection: Maximizing the Probability of Goal Satisfaction , 1990, AAAI.
[22] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[23] Mark S. Boddy,et al. Anytime Problem Solving Using Dynamic Programming , 1991, AAAI.
[24] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..
[25] Anne Condon,et al. The Complexity of Stochastic Games , 1992, Inf. Comput..
[26] Mark A. Peot,et al. Conditional nonlinear planning , 1992 .
[27] Daniel S. Weld,et al. UCPOP: A Sound, Complete, Partial Order Planner for ADL , 1992, KR.
[28] C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .
[29] Leslie Pack Kaelbling,et al. Planning With Deadlines in Stochastic Domains , 1993, AAAI.
[30] William S. Lovejoy,et al. Suboptimal Policies, with Bounds, for Parameter Adaptive Decision Processes , 1993, Oper. Res..
[31] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.
[32] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[33] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[34] Chelsea C. White,et al. Finite-Memory Suboptimal Design for Partially Observed Markov Decision Processes , 1994, Oper. Res..
[35] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[36] Anthony R. Cassandra,et al. Optimal Policies for Partially Observable Markov Decision Processes , 1994 .
[37] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[38] M. Littman. The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .
[39] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.
[40] Leslie Pack Kaelbling,et al. Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..
[41] Nicholas Kushmerick,et al. An Algorithm for Probabilistic Planning , 1995, Artif. Intell..
[42] Thomas Dean,et al. Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.
[43] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[44] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[45] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[46] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[47] R. Andrew McCallum,et al. Hidden state and reinforcement learning with instance-based state identification , 1996, IEEE Trans. Syst. Man Cybern. Part B.
[48] M. Paterson,et al. The complexity of mean payo games on graphs , 1996 .
[49] Alex Pentland,et al. Active gesture recognition using partially observable Markov decision processes , 1996, Proceedings of 13th International Conference on Pattern Recognition.
[50] Craig Boutilier,et al. uting Optimal Policies for Compact Representations , 1996 .
[51] Richard Washington,et al. Incremental Markov-model planning , 1996, Proceedings Eighth IEEE International Conference on Tools with Artificial Intelligence.
[52] Richard Washington,et al. Uncertainty and Real-Time Therapy Planning: Incremental Markov-Model Approaches , 1996 .
[53] Robert Givan,et al. Model Minimization, Regression, and Propositional STRIPS Planning , 1997, IJCAI.
[54] David Andre,et al. Generalized Prioritized Sweeping , 1997, NIPS.
[55] Wenju Liu,et al. A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains , 1997, J. Artif. Intell. Res..
[56] R. Simmons,et al. Xavier: A Robot Navigation Architecture Based on Partially Observable Markov Decision Process Models , 1998 .
[57] Richard Washington,et al. BI-POMDP: Bounded, Incremental, Partially-Observable Markov-Model Planning , 1997, ECP.
[58] Eric A. Hansen,et al. An Improved Policy Iteration Algorithm for Partially Observable MDPs , 1997, NIPS.
[59] Milos Hauskrecht,et al. Planning and control in stochastic domains with imperfect information , 1997 .
[60] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[61] E. Allender,et al. Encyclopaedia of Complexity Results for Finite-Horizon Markov Decision Process Problems , 1997 .
[62] Craig Boutilier,et al. Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..
[63] Robert Givan,et al. Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.
[64] Milos Hauskrecht,et al. Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes , 1997, AAAI/IAAI.
[65] Weihong Zhang,et al. Fast Value Iteration for Goal-Directed Markov Decision Processes , 1997, UAI.
[66] Ronen I. Brafman,et al. A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.
[67] Shlomo Zilberstein,et al. Heuristic Search in Cyclic AND/OR Graphs , 1998, AAAI/IAAI.
[68] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.
[69] Judy Goldsmith,et al. Complexity issues in Markov decision processes , 1998, Proceedings. Thirteenth Annual IEEE Conference on Computational Complexity (Formerly: Structure in Complexity Theory Conference) (Cat. No.98CB36247).
[70] Hector Geffner,et al. Solving Large POMDPs using Real Time Dynamic Programming , 1998 .
[71] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.
[72] Michael L. Littman,et al. The Computational Complexity of Probabilistic Planning , 1998, J. Artif. Intell. Res..
[73] Peter Haddawy,et al. Utility Models for Goal‐Directed, Decision‐Theoretic Planners , 1998, Comput. Intell..
[74] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .
[75] Ronen I. Brafman,et al. Structured Reachability Analysis for Markov Decision Processes , 1998, UAI.
[76] Shlomo Zilberstein,et al. Finite-memory control of partially observable systems , 1998 .
[77] Milos Hauskrecht,et al. Modeling treatment of ischemic heart disease with partially observable Markov decision processes , 1998, AMIA.
[78] Ronald Parr,et al. Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Problems , 1998, UAI.
[79] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.
[80] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.
[81] Weihong Zhang,et al. A Method for Speeding Up Value Iteration in Partially Observable Markov Decision Processes , 1999, UAI.
[82] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[83] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[84] Dit-Yan Yeung,et al. An Environment Model for Nonstationary Reinforcement Learning , 1999, NIPS.
[85] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[86] Milos Hauskrecht,et al. Planning treatment of ischemic heart disease with partially observable Markov decision processes , 2000, Artif. Intell. Medicine.
[87] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..
[88] Craig Boutilier,et al. Stochastic dynamic programming with factored representations , 2000, Artif. Intell..
[89] Thomas G. Dietterich,et al. A POMDP Approximation Algorithm That Anticipates the Need to Observe , 2000, PRICAI.
[90] Judy Goldsmith,et al. Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.
[91] Blai Bonet,et al. Planning with Incomplete Information as Heuristic Search in Belief Space , 2000, AIPS.
[92] Zhengzhu Feng,et al. Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.
[93] Weihong Zhang,et al. Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..
[94] Eric A. Hansen,et al. An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.
[95] Weihong Zhang,et al. Value Iteration over Belief Subspace , 2001, ECSQARU.
[96] Weihong Zhang,et al. Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs , 2001, ECSQARU.