Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur)
暂无分享,去创建一个
[1] F. H. Adler. Cybernetics, or Control and Communication in the Animal and the Machine. , 1949 .
[2] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[3] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .
[4] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[5] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[6] Y. L. Cun. Learning Process in an Asymmetric Threshold Network , 1986 .
[7] S. V. Emel'yanov,et al. Higher-order sliding modes in binary control systems , 1986 .
[8] Yann LeCun,et al. Learning processes in an asymmetric threshold network , 1986 .
[9] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .
[10] Scott E. Fahlman,et al. An empirical study of learning speed in back-propagation networks , 1988 .
[11] Aleksej F. Filippov,et al. Differential Equations with Discontinuous Righthand Sides , 1988, Mathematics and Its Applications.
[12] John Moody,et al. Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.
[13] F. A. Seiler,et al. Numerical Recipes in C: The Art of Scientific Computing , 1989 .
[14] William H. Press,et al. The Art of Scientific Computing Second Edition , 1998 .
[15] Todd K. Leen,et al. Weight Space Probability Densities in Stochastic Learning: II. Transients and Basin Hopping Times , 1992, NIPS.
[16] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[17] Martin Fodslette Møller,et al. A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.
[18] Christopher G. Atkeson,et al. Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.
[19] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.
[20] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[21] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[22] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[23] Karl Sims,et al. Evolving virtual creatures , 1994, SIGGRAPH.
[24] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[25] J. Shewchuk. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .
[26] Karl Sims,et al. Evolving 3D Morphology and Behavior by Competition , 1994, Artificial Life.
[27] S. Schaal,et al. Robot juggling: implementation of memory-based learning , 1994, IEEE Control Systems.
[28] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[29] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[30] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[31] Mark W. Spong,et al. The swing up control problem for the Acrobot , 1995 .
[32] Kenji Doya,et al. Temporal Difference Learning in Continuous Time and Space , 1995, NIPS.
[33] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.
[34] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[35] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[36] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[37] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[38] Todd K. Leen,et al. Using Curvature Information for Fast Stochastic Search , 1996, NIPS.
[39] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[40] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[41] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[42] Ralph Neuneier,et al. How to Train Neural Networks , 1996, Neural Networks: Tricks of the Trade.
[43] Gary Boone,et al. Efficient reinforcement learning: model-based Acrobot control , 1997, Proceedings of International Conference on Robotics and Automation.
[44] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..
[45] Stephan Pareigis,et al. Adaptive Choice of Grid and Time in Reinforcement Learning , 1997, NIPS.
[46] Christopher G. Atkeson,et al. A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.
[47] Gary Boone,et al. Minimum-time control of the Acrobot , 1997, Proceedings of International Conference on Robotics and Automation.
[48] Rémi Munos. Apprentissage par renforcement, étude du cas continu , 1997 .
[49] Rémi Munos,et al. A Convergent Reinforcement Learning Algorithm in the Continuous Case Based on a Finite Difference Method , 1997, IJCAI.
[50] Jun Morimoto,et al. Hierarchical Reinforcement Learning of Low-Dimensional Subgoals and High-Dimensional Trajectories , 1998, ICONIP.
[51] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[52] Andrew Tridgell,et al. Experiments in Parameter Learning Using Temporal Differences , 1998, J. Int. Comput. Games Assoc..
[53] Marios M. Polycarpou,et al. Preventing unlearning during online training of feedforward networks , 1998, Proceedings of the 1998 IEEE International Symposium on Intelligent Control (ISIC) held jointly with IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA) Intell.
[54] Marios M. Polycarpou,et al. An analytical framework for local feedforward networks , 1998, IEEE Trans. Neural Networks.
[55] Andrew W. Moore,et al. Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems , 1999, IJCAI.
[56] Andrew W. Moore,et al. Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).
[57] Yasuharu Koike,et al. Multiple state estimation reinforcement learning for driving model: driver model of automobile , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).
[58] Nicol N. Schraudolph,et al. Local Gain Adaptation in Stochastic Gradient Descent , 1999 .
[59] K. Kreutz-Delgado,et al. Obtaining minimum energy biped walking gaits with symbolic models and numerical optimal control , 1999 .
[60] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[61] Alexander Zelinsky,et al. Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.
[62] Junichiro Yoshimoto,et al. Application of reinforcement learning to balancing of Acrobot , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).
[63] Michiel van de Panne,et al. Control for Simulated Human and Animal Motion , 2000 .
[64] Charles W. Anderson,et al. Approximating a Policy Can be Easier Than Approximating a Value Function , 2000 .
[65] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[66] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[67] Jun Morimoto,et al. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..
[68] John N. Tsitsiklis,et al. On the Convergence of Optimistic Policy Iteration , 2002, J. Mach. Learn. Res..
[69] Thomas G. Dietterich,et al. Editors. Advances in Neural Information Processing Systems , 2002 .
[70] Jean-Arcady Meyer,et al. Evolutionary approaches to neural control of rolling, walking, swimming and flying animats or robots , 2003 .
[71] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[72] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[73] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[74] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[75] H. J. Pesch,et al. Real-time Collision Avoidance against Wrong Drivers: Diierential Game Approach, Numerical Solution and Synthesis of Strategies with Neural Networks , 2022 .