Learning Without State-Estimation in Partially Observable Markovian Decision Processes

[1]  P. Anandan,et al.  Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Satinder P. Singh,et al.  Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[3]  Richard S. Sutton,et al.  Sequential Decision Problems and Neural Networks , 1989, NIPS 1989.

[4]  Tom M. Mitchell,et al.  Reinforcement learning with hidden states , 1993 .

[5]  Longxin Lin,et al.  Reinforcement Learning in Non-Markov Environments , 1992 .

[6]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[7]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[8]  C. Watkins Learning from delayed rewards , 1989 .

[9]  Andrew G. Barto,et al.  Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.

[10]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[12]  Andrew McCallum,et al.  Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[13]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[14]  Steven Douglas Whitehead,et al.  Reinforcement learning for the adaptive control of perception and action , 1992 .

[15]  Thomas G. Dietterich Machine learning , 1996, CSUR.

[16]  Dana H. Ballard,et al.  Active Perception and Reinforcement Learning , 1990, Neural Computation.

[17]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[18]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[19]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[20]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..