Value-gradient learning
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] L. S. Pontryagin,et al. Mathematical Theory of Optimal Processes , 1962 .
[3] E. Feigenbaum,et al. Computers and Thought , 1963 .
[4] E. Blum,et al. The Mathematical Theory of Optimal Processes. , 1963 .
[5] L. M. Sonneborn,et al. The Bang-Bang Principle for Linear Control Systems , 1964 .
[6] M. L. Chambers. The Mathematical Theory of Optimal Processes , 1965 .
[7] A. L. Samuel,et al. Some studies in machine learning using the game of checkers. II: recent progress , 1967 .
[8] A. L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[9] Donald E. Kirk,et al. Optimal control theory : an introduction , 1970 .
[10] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.
[11] Bernard Widrow,et al. Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..
[12] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .
[13] R. Godson. Elements of intelligence , 1979 .
[14] Louis B. Rall,et al. Automatic Differentiation: Techniques and Applications , 1981, Lecture Notes in Computer Science.
[15] Paul J. Werbos,et al. Applications of advances in nonlinear sensitivity analysis , 1982 .
[16] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[17] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[18] Paul J. Werbos,et al. Building and Understanding Adaptive Systems: A Statistical/Numerical Approach to Factory Automation and Brain Research , 1987, IEEE Transactions on Systems, Man, and Cybernetics.
[19] Paul J. Werbos,et al. Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[20] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[21] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[22] Alexander Linden,et al. Inversion of neural networks by gradient descent , 1990, Parallel Comput..
[23] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.
[24] Panos J. Antsaklis,et al. Neural networks for control systems , 1990, IEEE Trans. Neural Networks.
[25] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[26] William H. Press,et al. Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .
[27] Etienne Barnard,et al. Temporal-difference methods and Markov models , 1993, IEEE Trans. Syst. Man Cybern..
[28] M. F. Møller,et al. Exact Calculation of the Product of the Hessian Matrix of Feed-Forward Network Error Functions and a Vector in 0(N) Time , 1993 .
[29] Martin A. Riedmiller,et al. A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.
[30] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[31] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[32] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[33] Lutz Prechelt,et al. A Set of Neural Network Benchmark Problems and Benchmarking Rules , 1994 .
[34] Barak A. Pearlmutter. Fast Exact Multiplication by the Hessian , 1994, Neural Computation.
[35] D. Signorini,et al. Neural networks , 1995, The Lancet.
[36] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .
[37] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[38] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[39] Roberto A. Santiago,et al. Adaptive critic designs: A case study for neurocontrol , 1995, Neural Networks.
[40] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[41] S. N. Balakrishnan,et al. Neurocontrol: A literature survey , 1996 .
[42] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[43] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[44] George G. Lendaris,et al. Training strategies for critic and action neural networks in dual heuristic programming method , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).
[45] Donald C. Wunsch,et al. Convergence of critic-based training , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.
[46] Martin T. Hagan,et al. Neural networks for control , 1999, Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251).
[47] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[48] Paul J. Werbos,et al. Stable adaptive control using new critic designs , 1998, Other Conferences.
[49] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[50] George G. Lendaris,et al. Adaptive critic design for intelligent steering and speed control of a 2-axle vehicle , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.
[51] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[52] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[53] Richard S. Sutton,et al. Comparing Policy-Gradient Algorithms , 2001 .
[54] S. N. Balakrishnan,et al. State-constrained agile missile control with adaptive-critic-based neural networks , 2002, IEEE Trans. Control. Syst. Technol..
[55] Robert F. Stengel,et al. An adaptive critic global controller , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).
[56] Rémi Coulom,et al. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .
[57] George G. Lendaris,et al. Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.
[58] Jennie Si,et al. Helicopter Flight-Control Reconfiguration for Main Rotor Actuator Failures , 2003 .
[59] Warren B. Powell,et al. GUIDANCE IN THE USE OF ADAPTIVE CRITICS FOR CONTROL , 2007 .
[60] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[61] Dieter Fox,et al. Reinforcement learning for sensing strategies , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).
[62] Jennie Si,et al. ADP: Goals, Opportunities and Principles , 2004 .
[63] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[64] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.
[65] A. Barto,et al. ModelBased Adaptive Critic Designs , 2004 .
[66] Jennie Si,et al. Adaptive Critic Based Neural Network for ControlConstrained Agile Missile , 2004 .
[67] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[68] Chao Lu,et al. Direct Neural Dynamic Programming Method for Power System Stability Enhancement , 2005, Proceedings of the 13th International Conference on, Intelligent Systems Application to Power Systems.
[69] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[70] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[71] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..
[72] Razvan V. Florian,et al. Correct equations for the dynamics of the cart-pole system , 2005 .
[73] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.
[74] Rajesh P. N. Rao,et al. Bayesian brain : probabilistic approaches to neural coding , 2006 .
[75] Emanuel Todorov,et al. Optimal Control Theory , 2006 .
[76] P. Werbos. Backwards Differentiation in AD and Neural Nets: Past Links and New Opportunities , 2006 .
[77] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[78] Rémi Munos,et al. Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation , 2005, J. Mach. Learn. Res..
[79] P.J. Werbos,et al. Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[80] Michael Fairbank,et al. Reinforcement Learning by Value Gradients , 2008, ArXiv.
[81] Frank L. Lewis,et al. Discrete-Time Nonlinear HJB Solution Using Approximate Dynamic Programming: Convergence Proof , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[82] Paul J. Werbos,et al. Foreword: ADP - The Key Direction for Future Research in Intelligent Control and Understanding Brain Intelligence , 2008, IEEE Trans. Syst. Man Cybern. Part B.
[83] Danil Prokhorov,et al. Computational Intelligence in Automotive Applications , 2008, Computational Intelligence in Automotive Applications.
[84] Huaguang Zhang,et al. Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.
[85] George G. Lendaris,et al. A retrospective on Adaptive Dynamic Programming for control , 2009, 2009 International Joint Conference on Neural Networks.
[86] F.L. Lewis,et al. Reinforcement learning and adaptive dynamic programming for feedback control , 2009, IEEE Circuits and Systems Magazine.
[87] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[88] Warren B. Powell,et al. What you should know about approximate dynamic programming , 2009, Naval Research Logistics (NRL).
[89] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[90] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[91] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[92] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[93] P. Schrimpf,et al. Dynamic Programming , 2011 .
[94] A. Heydari,et al. Finite-horizon input-constrained nonlinear optimal control using single network adaptive critics , 2011, Proceedings of the 2011 American Control Conference.
[95] Michael Fairbank,et al. The Local Optimality of Reinforcement Learning by Value Gradients, and its Relationship to Policy Gradient Learning , 2011, ArXiv.
[96] Shuhui Li,et al. Vector control of a grid-connected rectifier/inverter using an artificial neural network , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).
[97] Michael Fairbank,et al. A comparison of learning speed and ability to cope without exploration between DHP and TD(0) , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).
[98] Richard S. Sutton,et al. Temporal-difference search in computer Go , 2012, Machine Learning.
[99] F. Lewis,et al. Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.
[100] Michael Fairbank,et al. The divergence of reinforcement learning algorithms with value-iteration and function approximation , 2011, The 2012 International Joint Conference on Neural Networks (IJCNN).
[101] Michael Fairbank,et al. Simple and Fast Calculation of the Second-Order Gradients for Globalized Dual Heuristic Dynamic Programming in Neural Networks , 2012, IEEE Transactions on Neural Networks and Learning Systems.
[102] Michael Fairbank,et al. An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time , 2013, IEEE Transactions on Neural Networks and Learning Systems.
[103] Michael Fairbank,et al. Approximating Optimal Control with Value Gradient Learning , 2013 .
[104] Michael Fairbank,et al. The Importance of Clipping in Neurocontrol by Direct Gradient Descent on the Cost-to-Go Function and in Adaptive Dynamic Programming , 2013, ArXiv.
[105] Frank L. Lewis,et al. Reinforcement Learning and Approximate Dynamic Programming (RLADP)Â -Â Foundations, Common Misconceptions, and the Challenges Ahead , 2013 .
[106] Michael Fairbank,et al. Clipping in Neurocontrol by Adaptive Dynamic Programming , 2014, IEEE Transactions on Neural Networks and Learning Systems.
[107] Shuhui Li,et al. An adaptive recurrent neural-network controller using a stabilization matrix and predictive inputs to solve a tracking problem under disturbances , 2014, Neural Networks.