Learning Multidimensional Control Actions From Delayed Reinforcements
暂无分享,去创建一个
[1] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[2] C. Watkins. Learning from delayed rewards , 1989 .
[3] Richard W. Prager,et al. A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.
[4] Paweł Cichosz,et al. Reinforcement Learning Algorithms Based on the Methods of Temporal Differences , 1994 .
[5] Pawea Cichosz. Truncating Temporal Diierences: on the Eecient Implementation of Td for Reinforcement Learning , 1995 .