Learning Without State-Estimation in Partially Observable Markovian Decision Processes
暂无分享,去创建一个
[1] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[2] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[3] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[4] Satinder P. Singh,et al. Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.
[5] John N. Tsitsiklis,et al. Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.
[6] Andrew G. Barto,et al. Monte Carlo Matrix Inversion and Reinforcement Learning , 1993, NIPS.
[7] Tom M. Mitchell,et al. Reinforcement learning with hidden states , 1993 .
[8] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[9] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[10] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[11] Steven Douglas Whitehead,et al. Reinforcement learning for the adaptive control of perception and action , 1992 .
[12] Longxin Lin,et al. Reinforcement Learning in Non-Markov Environments , 1992 .
[13] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.
[14] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[15] C. Watkins. Learning from delayed rewards , 1989 .
[16] Richard S. Sutton,et al. Sequential Decision Problems and Neural Networks , 1989, NIPS 1989.
[17] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[18] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[19] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[20] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
[21] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..