An Actor-Critic Algorithm With Second-Order Actor and Critic
暂无分享,去创建一个
[1] Jing Wang,et al. Least squares temporal difference actor-critic methods with applications to robot motion control , 2011, IEEE Conference on Decision and Control and European Control Conference.
[2] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[3] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[4] Jing Wang,et al. A Hessian actor-critic algorithm , 2014, 53rd IEEE Conference on Decision and Control.
[5] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[6] Shalabh Bhatnagar,et al. Natural actorcritic algorithms. , 2009 .
[7] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[8] Jin Yu,et al. Natural Actor-Critic for Road Traffic Optimisation , 2006, NIPS.
[9] Ioannis Ch. Paschalidis,et al. A least squares temporal difference actor–critic algorithm with applications to warehouse management , 2012 .
[10] Sham M. Kakade,et al. Optimizing Average Reward Using Discounted Rewards , 2001, COLT/EuroCOLT.
[11] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[12] Jing Wang,et al. Temporal logic motion control using actor–critic methods , 2012, 2012 IEEE International Conference on Robotics and Automation.
[13] D. Harville. Matrix Algebra From a Statistician's Perspective , 1998 .
[14] P. Nagaraju,et al. Application of actor-critic learning algorithm for optimal bidding problem of a Genco , 2003, 2003 IEEE Power Engineering Society General Meeting (IEEE Cat. No.03CH37491).
[15] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[16] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[17] Andrew W. Fitzgibbon,et al. A fast natural Newton method , 2010, ICML.
[18] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[19] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[20] S. A. Soman,et al. Application of actor-critic learning algorithm for optimal bidding problem of a Genco , 2002 .
[21] Takashi Omori,et al. Adaptive internal state space construction method for reinforcement learning of a real-world agent , 1999, Neural Networks.
[22] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[23] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[24] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[25] D. Bertsekas,et al. Approximate solution methods for partially observable markov and semi-markov decision processes , 2006 .
[26] Ioannis Ch. Paschalidis,et al. An actor-critic method using Least Squares Temporal Difference learning , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.
[27] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[28] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[29] Mehdi Khamassi,et al. Actor–Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats , 2005, Adapt. Behav..
[30] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[31] John N. Tsitsiklis,et al. Simulation-based optimization of Markov reward processes , 1998, Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No.98CH36171).