Ensemble Algorithms in Reinforcement Learning
暂无分享,去创建一个
[1] K. Narendra,et al. Learning AutomataA Survey , 1974 .
[2] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..
[3] C. Watkins. Learning from delayed rewards , 1989 .
[4] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[5] Satinder P. Singh,et al. The Efficient Learning of Multiple Task Sequences , 1991, NIPS.
[6] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[7] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[8] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[9] Chen K. Tham,et al. Reinforcement learning of multiple tasks using a hierarchical CMAC architecture , 1995, Robotics Auton. Syst..
[10] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[11] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[12] P. J. Werbos,et al. Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot , 1996, 1996 IEEE International Conference on Systems, Man and Cybernetics. Information Intelligence and Systems (Cat. No.96CH35929).
[13] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.
[14] Ron Sun,et al. Multi-agent reinforcement learning: weighting and partitioning , 1999, Neural Networks.
[15] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[16] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[17] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[18] Andrew W. Moore,et al. Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.
[19] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.
[20] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[21] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[22] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[23] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[24] R. Clayton,et al. Epicardial ECG Mapping of Human Ventricular Fibrillation , 2006 .
[25] M.A. Wiering,et al. Two Novel On-policy Reinforcement Learning Algorithms based on TD(λ)-methods , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.