论文信息 - Learning Without State-Estimation in Partially Observable Markovian Decision Processes - 字舞流文

Learning Without State-Estimation in Partially Observable Markovian Decision Processes

Michael I. Jordan | Tommi S. Jaakkola | Satinder P. Singh | Satinder Singh | T. Jaakkola

[1] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[2] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[3] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[4] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[5] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[6] Andrew G. Barto,et al. Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.

[7] Tom M. Mitchell,et al. Reinforcement learning with hidden states , 1993 .

[8] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[9] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[10] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[11] Steven Douglas Whitehead,et al. Reinforcement learning for the adaptive control of perception and action , 1992 .

[12] Longxin Lin,et al. Reinforcement Learning in Non-Markov Environments , 1992 .

[13] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.

[14] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[15] C. Watkins. Learning from delayed rewards , 1989 .

[16] Richard S. Sutton,et al. Sequential Decision Problems and Neural Networks , 1989, NIPS 1989.

[17] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[18] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[19] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[20] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[21] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..