A least squares temporal difference actor–critic algorithm with applications to warehouse management
暂无分享,去创建一个
[1] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[2] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[3] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[4] Eiji Mizutani,et al. Two stochastic dynamic programming problems by model-free actor-critic recurrent-network learning in non-Markovian settings , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).
[5] Mehdi Khamassi,et al. Actor–Critic Models of Reinforcement Learning in the Basal Ganglia: From Natural to Artificial Rats , 2005, Adapt. Behav..
[6] Huizhen Yu,et al. A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies , 2005, UAI.
[7] Y. Saad,et al. Krylov Subspace Methods on Supercomputers , 1989 .
[8] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[9] C. Niedzwiedz,et al. A Consolidated Actor-Critic Model with Function Approximation for High-Dimensional POMDPs , 2008 .
[10] Dimitri P. Bertsekas,et al. Temporal Dierences-Based Policy Iteration and Applications in Neuro-Dynamic Programming 1 , 1997 .
[11] Mohammad Ghavamzadeh,et al. Bayesian actor-critic algorithms , 2007, ICML '07.
[12] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[13] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[14] Justin A. Boyan,et al. Least-Squares Temporal Difference Learning , 1999, ICML.
[15] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[16] Takashi Omori,et al. Adaptive internal state space construction method for reinforcement learning of a real-world agent , 1999, Neural Networks.
[17] Dimitri P. Bertsekas,et al. Discretized Approximations for POMDP with Average Cost , 2004, UAI.
[18] Andrew G. Barto,et al. An Actor/Critic Algorithm that is Equivalent to Q-Learning , 1994, NIPS.
[19] Andrew G. Barto,et al. Linear Least-Squares Algorithms for Temporal Difference Learning , 2005, Machine Learning.
[20] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[21] Douglas Aberdeen,et al. Policy-Gradient Algorithms for Partially Observable Markov Decision Processes , 2003 .
[22] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[23] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[24] Shalabh Bhatnagar,et al. Incremental Natural Actor-Critic Algorithms , 2007, NIPS.
[25] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[26] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[27] S. Ioffe,et al. Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming , 1996 .
[28] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[29] Vivek S. Borkar,et al. An actor-critic algorithm for constrained Markov decision processes , 2005, Syst. Control. Lett..
[30] Jack Dongarra,et al. Templates for the Solution of Algebraic Eigenvalue Problems , 2000, Software, environments, tools.
[31] Warren B. Powell,et al. Approximate Dynamic Programming I: Modeling , 2011 .
[32] Ioannis Ch. Paschalidis,et al. Intelligent forklift dispatching in warehouses using a sensor network , 2009, 2009 17th Mediterranean Conference on Control and Automation.
[33] S. A. Soman,et al. Application of Actor-Critic Learning Algorithm for Optimal Bidding Problem of a Genco , 2002, IEEE Power Engineering Review.
[34] Jiaqiao Hu,et al. Simulation-based Algorithms for Markov Decision Processes (Communications and Control Engineering) , 2007 .
[35] Michael T. Rosenstein,et al. Supervised Actor‐Critic Reinforcement Learning , 2012 .
[36] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[37] Stefan Schaal,et al. Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[38] Hamid R. Berenji,et al. A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters , 2003, IEEE Trans. Fuzzy Syst..