Learning Policies for Partially Observable Environments: Scaling Up

[1]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[2]  Stuart J. Russell,et al.  Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[3]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[4]  Nicholas Kushmerick,et al.  An Algorithm for Probabilistic Planning , 1995, Artif. Intell..

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  M. Littman The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .

[7]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[8]  Anthony R. Cassandra,et al.  Optimal Policies for Partially Observable Markov Decision Processes , 1994 .

[9]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[10]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[11]  Leslie Pack Kaelbling,et al.  Toward Approximate Planning in Very Large Stochastic Domains , 1994, AAAI 1994.

[12]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[13]  Sridhar Mahadevan,et al.  Rapid Task Learning for Real Robots , 1993 .

[14]  Ronald J. Williams,et al.  Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[15]  R. A. McCallum First Results with Utile Distinction Memory for Reinforcement Learning , 1992 .

[16]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[17]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[18]  Hsien-Te Cheng,et al.  Algorithms for partially observable markov decision processes , 1989 .

[19]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[20]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[21]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[22]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[23]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[24]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .