Learning Policies for Partially Observable Environments: Scaling Up
暂无分享,去创建一个
Leslie Pack Kaelbling | Michael L. Littman | Anthony R. Cassandra | M. Littman | A. Cassandra | L. Kaelbling
[1] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[2] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[3] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.
[4] Nicholas Kushmerick,et al. An Algorithm for Probabilistic Planning , 1995, Artif. Intell..
[5] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[6] M. Littman. The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .
[7] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[8] Anthony R. Cassandra,et al. Optimal Policies for Partially Observable Markov Decision Processes , 1994 .
[9] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[10] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[11] Leslie Pack Kaelbling,et al. Toward Approximate Planning in Very Large Stochastic Domains , 1994, AAAI 1994.
[12] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.
[13] Sridhar Mahadevan,et al. Rapid Task Learning for Real Robots , 1993 .
[14] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[15] R. A. McCallum. First Results with Utile Distinction Memory for Reinforcement Learning , 1992 .
[16] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[17] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[18] Hsien-Te Cheng,et al. Algorithms for partially observable markov decision processes , 1989 .
[19] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[20] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[21] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[22] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
[23] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[24] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .