暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .
[3] Lennart Ljung,et al. Analysis of recursive stochastic algorithms , 1977 .
[4] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[5] Peter C. Young,et al. Recursive Estimation and Time-Series Analysis: An Introduction , 1984 .
[6] P. Kumar,et al. Theory and practice of recursive identification , 1985, IEEE Transactions on Automatic Control.
[7] S. Haykin,et al. Adaptive Filter Theory , 1986 .
[8] Hong Wang,et al. Recursive estimation and time-series analysis , 1986, IEEE Trans. Acoust. Speech Signal Process..
[9] David D. Falconer,et al. Tracking properties and steady-state performance of RLS adaptive filter algorithms , 1986, IEEE Trans. Acoust. Speech Signal Process..
[10] Eweda Eweda,et al. Convergence of the RLS and LMS adaptive filters , 1987 .
[11] Hamid R. Berenji,et al. Learning and tuning fuzzy logic controllers through reinforcements , 1992, IEEE Trans. Neural Networks.
[12] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[13] P. Dayan,et al. TD ( X ) Converges with Probability 1 , 1994 .
[14] C. S. George Lee,et al. Reinforcement structure/parameter learning for neural-network-based fuzzy logic control systems , 1994, IEEE Trans. Fuzzy Syst..
[15] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[16] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[17] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[18] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[19] George V. Moustakides. Study of the transient phase of the forgetting factor RLS , 1997, IEEE Trans. Signal Process..
[20] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[21] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[22] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[23] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[24] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[25] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[26] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[27] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[28] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[29] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[30] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[31] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[32] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.