Online Model Learning Algorithms for Actor-Critic Control
暂无分享,去创建一个
[1] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[2] M. Ciletti,et al. The computation and theory of optimal control , 1972 .
[3] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.
[4] L. Hasdorff. Gradient Optimization and Nonlinear Control , 1976 .
[5] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[6] Jon Louis Bentley,et al. Data Structures for Range Searching , 1979, CSUR.
[7] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[8] Gene F. Franklin,et al. Feedback Control of Dynamic Systems , 1986 .
[9] Peter W. Glynn,et al. Likelilood ratio gradient estimation: an overview , 1987, WSC '87.
[10] Proceedings of the 1987 winter simulation conference , 1988 .
[11] C. Watkins. Learning from delayed rewards , 1989 .
[12] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.
[13] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[14] Oliver G. Selfridge,et al. Real-time learning: a ball on a beam , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.
[15] Vijaykumar Gullapalli,et al. Learning Control Under Extreme Uncertainty , 1992, NIPS.
[16] R. J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[17] Reinforcement Learning Architectures , 1992 .
[18] J. Spall. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation , 1992 .
[19] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[20] Andrew G. Barto,et al. Adaptive linear quadratic control using policy iteration , 1994, Proceedings of 1994 American Control Conference - ACC '94.
[21] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[22] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[23] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[24] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[25] Richard S. Sutton,et al. Model-Based Reinforcement Learning with an Approximate, Learned Model , 1996 .
[26] Kwang Y. Lee,et al. An optimal tracking neuro-controller for nonlinear dynamic systems , 1996, IEEE Trans. Neural Networks.
[27] Ian Postlethwaite,et al. Multivariable Feedback Control: Analysis and Design , 1996 .
[28] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[29] Leslie Pack Kaelbling,et al. Recent Advances in Reinforcement Learning , 1996, Springer US.
[30] John N. Tsitsiklis,et al. Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.
[31] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[32] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..
[33] V. Borkar. Stochastic approximation with two time scales , 1997 .
[34] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.
[35] Benjamin Van Roy,et al. Average cost temporal-difference learning , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.
[36] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[37] Shun-ichi Amari,et al. Why natural gradient? , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[38] Young-Moon Park,et al. A receding horizon optimal tracking neurocontroller for nonlinear dynamic systems , 1998 .
[39] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[40] W. Ames. Mathematics in Science and Engineering , 1999 .
[41] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[42] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[43] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.
[44] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[45] Vivek S. Borkar,et al. A sensitivity formula for risk-sensitive cost and the actor-critic algorithm , 2001, Syst. Control. Lett..
[46] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[47] J. Spall. STOCHASTIC OPTIMIZATION , 2002 .
[48] Shigenobu Kobayashi,et al. Reinforcement learning of walking behavior for a four-legged robot , 2001, Proceedings of the 40th IEEE Conference on Decision and Control (Cat. No.01CH37228).
[49] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.
[50] Jeffrey M. Forbes,et al. Representations for learning control policies , 2002 .
[51] J. Bagnell,et al. Policy search in kernel Hilbert space , 2003 .
[52] Hamid R. Berenji,et al. A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters , 2003, IEEE Trans. Fuzzy Syst..
[53] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[54] Jeff G. Schneider,et al. Covariant Policy Search , 2003, IJCAI.
[55] Y. Narahari,et al. Reinforcement learning applications in dynamic pricing of retail markets , 2003, EEE International Conference on E-Commerce, 2003. CEC 2003..
[56] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[57] Tony R. Martinez,et al. Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.
[58] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[59] David W. Aha,et al. A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.
[60] Ming Lu,et al. Proceedings of the Third International Conference on Machine Learning and Cybernetics , 2004 .
[61] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.
[62] Yu-hu Cheng,et al. Application of actor-critic learning to adaptive state space construction , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).
[63] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[64] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[65] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[66] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[67] Nicholas Bambos,et al. A fuzzy reinforcement learning approach to power control in wireless transmitters , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[68] Martin A. Riedmiller,et al. CBR for State Value Function Approximation in Reinforcement Learning , 2005, ICCBR.
[69] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[70] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[71] Jongho Kim,et al. An RLS-Based Natural Actor-Critic Algorithm for Locomotion of a Two-Linked Robot Arm , 2005, CIS.
[72] Tao Xiong,et al. A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..
[73] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[74] A. Willsky,et al. Importance sampling actor-critic algorithms , 2006, 2006 American Control Conference.
[75] Warren B. Powell,et al. Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.
[76] Jin Yu,et al. Natural Actor-Critic for Road Traffic Optimisation , 2006, NIPS.
[77] Frank L. Lewis,et al. Fixed-Final Time Constrained Optimal Control of Nonlinear Systems Using Neural Network HJB Approach , 2006, CDC.
[78] Javier A. Barria,et al. Reinforcement Learning for Resource Allocation in LEO Satellite Networks , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[79] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[80] H. Robbins. A Stochastic Approximation Method , 1951 .
[81] Shin Ishii,et al. Reinforcement learning for a biped robot based on a CPG-actor-critic method , 2007, Neural Networks.
[82] Xuesong Wang,et al. A fuzzy Actor-Critic reinforcement learning network , 2007, Inf. Sci..
[83] Lyle Noakes,et al. Continuous-Time Adaptive Critics , 2007, IEEE Transactions on Neural Networks.
[84] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .
[85] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[86] Shalabh Bhatnagar,et al. Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processes , 2008, Simul..
[87] Manuel Lopes,et al. Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs , 2008, ECML/PKDD.
[88] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[89] Junichiro Yoshimoto,et al. A New Natural Policy Gradient by Stationary Distribution Metric , 2008, ECML/PKDD.
[90] Kristian Kersting,et al. Non-parametric policy gradients: a unified treatment of propositional and relational domains , 2008, ICML '08.
[91] Jan Peters,et al. Fitted Q-iteration by Advantage Weighted Regression , 2008, NIPS.
[92] Chun-Gui Li,et al. A Multi-agent Reinforcement Learning using Actor-Critic methods , 2008, 2008 International Conference on Machine Learning and Cybernetics.
[93] H. Kimura. Natural gradient actor-critic algorithms using random rectangular coarse coding , 2008, 2008 SICE Annual Conference.
[94] Philippe Preux,et al. Basis Expansion in Natural Actor Critic Methods , 2008, EWRL.
[95] A Consolidated Actor-Critic Model with Function Approximation for High-Dimensional POMDPs , 2008 .
[96] Jan Peters,et al. Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.
[97] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[98] Ioannis Ch. Paschalidis,et al. An actor-critic method using Least Squares Temporal Difference learning , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[99] Wang Meng,et al. Urban Traffic Signal Learning Control Using Fuzzy Actor-Critic Methods , 2009, 2009 Fifth International Conference on Natural Computation.
[100] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[101] Abhijit Gosavi,et al. Reinforcement Learning: A Tutorial Survey and Recent Advances , 2009, INFORMS J. Comput..
[102] Luigi Fortuna,et al. Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control , 2009 .
[103] Frank L. Lewis,et al. Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem , 2009, 2009 International Joint Conference on Neural Networks.
[104] Meng Wang,et al. Urban Traffic Signal Learning Control Using Fuzzy Actor-Critic Methods , 2009, ICNC.
[105] Marko Grobelnik,et al. Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II , 2009 .
[106] Marc Toussaint,et al. Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.
[107] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[108] Junichiro Yoshimoto,et al. A Generalized Natural Actor-Critic Algorithm , 2009, NIPS.
[109] Pawel Wawrzynski,et al. Real-time reinforcement learning by sequential Actor-Critics and experience replay , 2009, Neural Networks.
[110] Andres El-Fakdi,et al. Two steps natural actor critic learning for underwater cable tracking , 2010, 2010 IEEE International Conference on Robotics and Automation.
[111] A. Gosavi. Finite horizon Markov control with one-step variance penalties , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[112] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[113] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[114] 김병찬,et al. Impedance learning for Robotic Contact Tasks using Natural Actor-Critic Algorithm , 2010 .
[115] Alessandro Lazaric,et al. Finite-Sample Analysis of LSTD , 2010, ICML.
[116] Junichiro Yoshimoto,et al. Derivatives of Logarithmic Stationary Distributions for Policy Gradient Reinforcement Learning , 2010, Neural Computation.
[117] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[118] Rafael Castro-Linares,et al. Trajectory tracking for non-holonomic cars: A linear approach to controlled leader-follower formation , 2010, 49th IEEE Conference on Decision and Control (CDC).
[119] Shalabh Bhatnagar. An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes , 2010, Syst. Control. Lett..
[120] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[121] Ioannis Ch. Paschalidis,et al. A Distributed Actor-Critic Algorithm and Applications to Mobile Sensor Network Coordination Problems , 2010, IEEE Transactions on Automatic Control.
[122] Aude Billard,et al. Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.
[123] Peter A. Flach,et al. Proceedings of the 28th International Conference on Machine Learning , 2011 .
[124] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.
[125] Robert Babuska,et al. Actor-Critic Control with Reference Model Learning , 2011 .
[126] Fernando José Von Zuben,et al. A neural architecture to address Reinforcement Learning problems , 2011, The 2011 International Joint Conference on Neural Networks.
[127] U. Rieder,et al. Markov Decision Processes with Applications to Finance , 2011 .
[128] Jason Pazis,et al. Reinforcement learning in multidimensional continuous action spaces , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[129] David Barber,et al. Lagrange Dual Decomposition for Finite Horizon Markov Decision Processes , 2011, ECML/PKDD.
[130] Robert Babuska,et al. A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[131] Robert Babuska,et al. Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[132] Robert Babuska,et al. Efficient Model Learning Methods for Actor–Critic Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[133] Jean-Paul Chilès,et al. Wiley Series in Probability and Statistics , 2012 .
[134] Robert Babuska,et al. Model learning actor-critic algorithms: Performance evaluation in a motion control task , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[135] Derong Liu,et al. Finite-horizon neural optimal tracking control for a class of nonlinear systems with unknown dynamics , 2012, Proceedings of the 10th World Congress on Intelligent Control and Automation.
[136] Plamen Angelov,et al. Proceedings of the 2013 International Joint Conference on Neural Networks , 2013 .
[137] Hao Xu,et al. Solutions to finite horizon cost problems using actor-critic reinforcement learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).
[138] Robert Babuska,et al. Model-free and model-based time-optimal control of a badminton robot , 2013, 2013 9th Asian Control Conference (ASCC).
[139] Robert Babuska,et al. Comparison of model-free and model-based methods for time optimal hit control of a badminton robot , 2014 .
[140] Robert Babuska,et al. Learning rate free reinforcement learning for real-time motion control using a value-gradient based policy , 2014 .
[141] P. Glynn. LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .
[142] Fuzzy Logic in Control Systems : Fuzzy Logic , 2022 .