Cross-Entropy Optimization of Control Policies With Adaptive Basis Functions
暂无分享,去创建一个
Bart De Schutter | Damien Ernst | Robert Babuska | Lucian Busoniu | L. Buşoniu | B. Schutter | Robert Babuška | D. Ernst
[1] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[2] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[3] John Rust. Numerical dynamic programming in economics , 1996 .
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] A. A. Jafari,et al. Genetic algorithm methods for solving the best stationary policy of finite Markov decision processes , 1998, Proceedings of Thirtieth Southeastern Symposium on System Theory.
[6] Preben Alstrøm,et al. Learning to Drive a Bicycle Using Reinforcement Learning and Shaping , 1998, ICML.
[7] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[8] Danny Barash,et al. A Genetic Search In Policy Space For Solving Markov Decision Processes , 1999 .
[9] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[10] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[11] John N. Tsitsiklis,et al. Approximate Gradient Methods in Policy-Space Optimization of Markov Reward Processes , 2003, Discret. Event Dyn. Syst..
[12] Shie Mannor,et al. The Cross Entropy Method for Fast Policy Search , 2003, ICML.
[13] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[14] Michail G. Lagoudakis,et al. Reinforcement Learning as Classification: Leveraging Modern Classifiers , 2003, ICML.
[15] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.
[16] B. Adams,et al. Dynamic multidrug therapies for hiv: optimal and sti control approaches. , 2004, Mathematical biosciences and engineering : MBE.
[17] John N. Tsitsiklis,et al. Feature-based methods for large scale dynamic programming , 2004, Machine Learning.
[18] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[19] Dimitri P. Bertsekas,et al. Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC , 2005, Eur. J. Control.
[20] Shie Mannor,et al. Basis Function Adaptation in Temporal Difference Reinforcement Learning , 2005, Ann. Oper. Res..
[21] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[22] Rémi Munos,et al. Policy Gradient in Continuous Time , 2006, J. Mach. Learn. Res..
[23] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[24] Louis Wehenkel,et al. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.
[25] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[26] R. Clayton,et al. Epicardial ECG Mapping of Human Ventricular Fibrillation , 2006 .
[27] Lih-Yuan Deng,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.
[28] Kwong-Sak Leung,et al. A Memetic Algorithm for Multiple-Drug Cancer Chemotherapy Schedule Optimization , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[29] Dirk P. Kroese,et al. Convergence properties of the cross-entropy method for discrete optimization , 2007, Oper. Res. Lett..
[30] Bart De Schutter,et al. Continuous-State Reinforcement Learning with Fuzzy Approximation , 2007, Adaptive Agents and Multi-Agents Systems.
[31] Sridhar Mahadevan,et al. Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..
[32] Martin A. Riedmiller,et al. Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[33] Christopher G. Atkeson,et al. Random Sampling of States in Dynamic Programming , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[34] Derong Liu,et al. Adaptive Critic Learning Techniques for Engine Torque and Air–Fuel Ratio Control , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[35] Christos Dimitrakakis,et al. Rollout sampling approximate policy iteration , 2008, Machine Learning.
[36] Bart De Schutter,et al. Policy search with cross-entropy optimization of basis functions , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[37] Donald R. Jones,et al. Direct Global Optimization Algorithm , 2009, Encyclopedia of Optimization.
[38] Steven I. Marcus,et al. Simulation-based Algorithms for Markov Decision Processes/ Hyeong Soo Chang ... [et al.] , 2013 .