论文信息 - New algorithms of the Q-learning type - 字舞流文

New algorithms of the Q-learning type

Shalabh Bhatnagar | K. Mohan Babu | S. Bhatnagar | K. M. Babu

[1] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .

[2] Michael L. Littman,et al. Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach , 1993, NIPS.

[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[4] James C. Spall,et al. A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..

[5] V. Borkar. Stochastic approximation with two time scales , 1997 .

[6] V. Borkar. Asynchronous Stochastic Approximations , 1998 .

[7] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..

[8] John N. Tsitsiklis,et al. Call admission control and routing in integrated services networks using neuro-dynamic programming , 2000, IEEE Journal on Selected Areas in Communications.

[9] Michael C. Fu,et al. Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences , 2003, TOMC.

[10] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[11] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.

[12] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.