New algorithms of the Q-learning type
暂无分享,去创建一个
[1] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .
[2] Michael L. Littman,et al. Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.
[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[4] James C. Spall,et al. A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..
[5] V. Borkar. Stochastic approximation with two time scales , 1997 .
[6] V. Borkar. Asynchronous Stochastic Approximations , 1998 .
[7] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[8] John N. Tsitsiklis,et al. Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.
[9] Michael C. Fu,et al. Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences , 2003, TOMC.
[10] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[11] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[12] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.