Reinforcement Learning in Continuous State and Action Spaces
暂无分享,去创建一个
[1] Shimon Whiteson,et al. Evolutionary Function Approximation for Reinforcement Learning , 2006, J. Mach. Learn. Res..
[2] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[3] Michail G. Lagoudakis,et al. Binary action search for learning continuous-action control policies , 2009, ICML '09.
[4] R. A. Leibler,et al. On Information and Sufficiency , 1951 .
[5] Scott Kirkpatrick,et al. Optimization by simulated annealing: Quantitative studies , 1984 .
[6] M. J. D. Powell,et al. UOBYQA: unconstrained optimization by quadratic approximation , 2002, Math. Program..
[7] L. D. Whitley,et al. Genetic Reinforcement Learning for Neurocontrol Problems , 2004, Machine Learning.
[8] Lawrence. Davis,et al. Handbook Of Genetic Algorithms , 1990 .
[9] Charles W. Anderson,et al. Learning to Control an Inverted Pendulum with Connectionist Networks , 1988, 1988 American Control Conference.
[10] John H. Holland,et al. Outline for a Logical Theory of Adaptive Systems , 1962, JACM.
[11] Doina Precup,et al. Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.
[12] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[13] Paul J. Werbos,et al. Consistency of HDP applied to a simple reinforcement learning problem , 1990, Neural Networks.
[14] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[15] R. Bellman. Dynamic programming. , 1957, Science.
[16] Risto Miikkulainen,et al. Efficient Reinforcement Learning through Symbiotic Evolution , 2004 .
[17] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[18] Marc Schoenauer,et al. Supervised and Evolutionary Learning of Echo State Networks , 2008, PPSN.
[19] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[20] Risto Miikkulainen,et al. Efficient Reinforcement Learning Through Evolving Neural Network Topologies , 2002, GECCO.
[21] Vivek S. Borkar,et al. Actor-Critic - Type Learning Algorithms for Markov Decision Processes , 1999, SIAM J. Control. Optim..
[22] Bruno Scherrer,et al. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view , 2010, ICML.
[23] Terrence J. Sejnowski,et al. TD(λ) Converges with Probability 1 , 1994, Machine Learning.
[24] J. Albus. A Theory of Cerebellar Function , 1971 .
[25] Martin Berggren,et al. Hybrid differentiation strategies for simulation and analysis of applications in C++ , 2008, TOMS.
[26] I ScottKirkpatrick. Optimization by Simulated Annealing: Quantitative Studies , 1984 .
[27] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[28] Rémi Coulom,et al. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .
[29] Andrea Bonarini. Delayed Reinforcement , Fuzzy Q-Learning and Fuzzy Logic Controllers , 1996 .
[30] Tony R. Martinez,et al. The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.
[31] M. Powell. The NEWUOA software for unconstrained optimization without derivatives , 2006 .
[32] P. J. Werbos,et al. Backpropagation and neurocontrol: a review and prospectus , 1989, International 1989 Joint Conference on Neural Networks.
[33] Tom Schaul,et al. Efficient natural evolution strategies , 2009, GECCO.
[34] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..
[35] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[36] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[37] Francisco Herrera,et al. Genetic Algorithms and Soft Computing , 1996 .
[38] R. Sutton,et al. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010 .
[39] Lionel Jouffe,et al. Fuzzy inference system learning by reinforcement methods , 1998, IEEE Trans. Syst. Man Cybern. Part C.
[40] Richard S. Sutton,et al. GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces , 2010, Artificial General Intelligence.
[41] A. E. Eiben,et al. Introduction to Evolutionary Computing , 2003, Natural Computing Series.
[42] Petros Koumoutsakos,et al. Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.
[43] Warren B. Powell,et al. Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics) , 2007 .
[44] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[45] G. Saridis,et al. Approximate Solutions to the Time-Invariant Hamilton–Jacobi–Bellman Equation , 1998 .
[46] Alessandro Lazaric,et al. Finite-sample Analysis of Bellman Residual Minimization , 2010, ACML.
[47] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.
[48] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[49] J. N. Edwards,et al. Physical Violence Between Siblings A Theoretical and Empirical Analysis , 2005 .
[50] Risto Miikkulainen,et al. Accelerated Neural Evolution through Cooperatively Coevolved Synapses , 2008, J. Mach. Learn. Res..
[51] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[52] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[53] Shimon Whiteson,et al. Comparing evolutionary and temporal difference methods in a reinforcement learning domain , 2006, GECCO.
[54] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[55] Csaba Szepesv. Algorithms for Reinforcement Learning , 2010 .
[56] A. Barto,et al. Improved Temporal Difference Methods with Linear Function Approximation , 2004 .
[57] Jan Peters,et al. Model learning for robot control: a survey , 2011, Cognitive Processing.
[58] Ingo Rechenberg,et al. Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .
[59] Thomas Bäck,et al. An Overview of Evolutionary Algorithms for Parameter Optimization , 1993, Evolutionary Computation.
[60] William D. Smart,et al. Interpolation-based Q-learning , 2004, ICML.
[61] Thomas Bäck,et al. Evolutionary Algorithms in Theory and Practice , 1996 .
[62] Lotfi A. Zadeh,et al. Fuzzy Sets , 1996, Inf. Control..
[63] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[64] Huaguang Zhang,et al. Adaptive Dynamic Programming: An Introduction , 2009, IEEE Computational Intelligence Magazine.
[65] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.
[66] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[67] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[68] Frank Sehnke,et al. Parameter-exploring policy gradients , 2010, Neural Networks.
[69] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .
[70] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[71] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[72] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..
[73] Robert Babuska,et al. Fuzzy Modeling for Control , 1998 .
[74] M. Kendall. Statistical Methods for Research Workers , 1937, Nature.
[75] Alborz Geramifard,et al. iLSTD: Eligibility Traces and Convergence Analysis , 2006, NIPS.
[76] Yoshua Bengio,et al. Pattern Recognition and Neural Networks , 1995 .
[77] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[78] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[79] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[80] George G. Lendaris,et al. Adaptive dynamic programming , 2002, IEEE Trans. Syst. Man Cybern. Part C.
[81] Stuart J. Russell,et al. Bayesian Q-Learning , 1998, AAAI/IAAI.
[82] Philipp Slusallek,et al. Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.
[83] Anne Auger,et al. Comparing results of 31 algorithms from the black-box optimization benchmarking BBOB-2009 , 2010, GECCO '10.
[84] Changjiu Zhou,et al. Dynamic balance of a biped robot using fuzzy reinforcement learning agents , 2003, Fuzzy Sets Syst..
[85] Nikolaus Hansen,et al. Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.
[86] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[87] Dirk P. Kroese,et al. The Cross Entropy Method: A Unified Approach To Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics) , 2004 .
[88] Kary Främling. Replacing eligibility trace for action-value learning with function approximation , 2007, ESANN.
[89] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[90] Peter L. Bartlett,et al. Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning , 2001, J. Mach. Learn. Res..
[91] Shalabh Bhatnagar,et al. Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.
[92] A. P. Wieland,et al. Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.
[93] Christian Igel,et al. Evolution Strategies for Direct Policy Search , 2008, PPSN.
[94] Isao Ono,et al. Bidirectional Relation between CMA Evolution Strategies and Natural Evolution Strategies , 2010, PPSN.
[95] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[96] Riccardo Poli,et al. Particle swarm optimization , 1995, Swarm Intelligence.
[97] F. Glover,et al. Handbook of Metaheuristics , 2019, International Series in Operations Research & Management Science.
[98] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[99] Arthur E. Bryson,et al. Applied Optimal Control , 1969 .
[100] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[101] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[102] Claude F. Touzet,et al. Neural reinforcement learning for behaviour synthesis , 1997, Robotics Auton. Syst..
[103] Alexander Zelinsky,et al. Q-Learning in Continuous State and Action Spaces , 1999, Australian Joint Conference on Artificial Intelligence.
[104] Couette Viscometry Equation. APPROXIMATE SOLUTIONS FOR THE , 2005 .
[105] P. Y. Glorennec,et al. Fuzzy Q-learning and dynamical fuzzy Q-learning , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.
[106] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.
[107] P. Werbos,et al. Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .
[108] Bart De Schutter,et al. Continuous-State Reinforcement Learning with Fuzzy Approximation , 2007, Adaptive Agents and Multi-Agents Systems.
[109] Huaiyu Zhu. On Information and Sufficiency , 1997 .
[110] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[111] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[112] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[113] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[114] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[115] R. Fisher,et al. On the Mathematical Foundations of Theoretical Statistics , 1922 .
[116] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[117] C.W. Anderson,et al. Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.
[118] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[119] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[120] Bart De Schutter,et al. Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .
[121] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .
[122] Csaba Szepesvári,et al. A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.
[123] Shalabh Bhatnagar,et al. Natural actorcritic algorithms. , 2009 .
[124] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[125] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..
[126] Gerald Tesauro,et al. Practical issues in temporal difference learning , 1992, Machine Learning.
[127] E. S. Pearson,et al. ON THE USE AND INTERPRETATION OF CERTAIN TEST CRITERIA FOR PURPOSES OF STATISTICAL INFERENCE PART I , 1928 .
[128] David S. Touretzky,et al. Proceedings of the 1993 Connectionist Models Summer School , 2014 .
[129] C. S. George Lee,et al. Reinforcement structure/parameter learning for neural-network-based fuzzy logic control systems , 1994, IEEE Trans. Fuzzy Syst..
[130] Panos M. Pardalos,et al. Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..
[131] Lih-Yuan Deng,et al. The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.
[132] Hans-Paul Schwefel,et al. Evolution strategies – A comprehensive introduction , 2002, Natural Computing.
[133] Marco Wiering,et al. The QV family compared to other reinforcement learning algorithms , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[134] George J. Klir,et al. Fuzzy sets and fuzzy logic - theory and applications , 1995 .
[135] Andrew Y. Ng,et al. Policy Search via Density Estimation , 1999, NIPS.
[136] Camilla Nore,et al. A theoretical and empirical analysis , 2011 .
[137] Tom Schaul,et al. Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).
[138] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[139] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[140] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[141] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[142] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[143] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[144] Tom Schaul,et al. Exploring parameter space in reinforcement learning , 2010, Paladyn J. Behav. Robotics.
[145] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Vol. II , 1976 .
[146] Csaba Szepesvári,et al. Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.
[147] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .
[148] Thomas Bäck,et al. Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .
[149] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[150] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[151] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[152] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..
[153] W. Vent,et al. Rechenberg, Ingo, Evolutionsstrategie — Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. 170 S. mit 36 Abb. Frommann‐Holzboog‐Verlag. Stuttgart 1973. Broschiert , 1975 .
[154] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[155] Arnold Neumaier,et al. SNOBFIT -- Stable Noisy Optimization by Branch and Fit , 2008, TOMS.
[156] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..
[157] M. Bardi,et al. Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations , 1997 .
[158] L PutermanMartin,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[159] Richard S. Sutton,et al. A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.
[160] Dimitri P. Bertsekas,et al. Least Squares Policy Evaluation Algorithms with Linear Function Approximation , 2003, Discret. Event Dyn. Syst..
[161] M. Dahleh. Laboratory for Information and Decision Systems , 2005 .
[162] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[163] T. Michael Knasel,et al. Robotics and autonomous systems , 1988, Robotics Auton. Syst..
[164] Justin A. Boyan,et al. Technical Update: Least-Squares Temporal Difference Learning , 2002, Machine Learning.
[165] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..
[166] Tom Schaul,et al. Exponential natural evolution strategies , 2010, GECCO '10.
[167] Florentin Wörgötter,et al. Advances in Neural Information Processing Systems 16 (NIPS 2003) , 2004 .
[168] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[169] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[170] David Andre,et al. Model based Bayesian Exploration , 1999, UAI.
[171] Malcolm J. A. Strens,et al. A Bayesian Framework for Reinforcement Learning , 2000, ICML.
[172] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[173] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[174] Hamid R. Berenji,et al. Learning and tuning fuzzy logic controllers through reinforcements , 1992, IEEE Trans. Neural Networks.
[175] R. Sutton,et al. A convergent O ( n ) algorithm for off-policy temporal-difference learning with linear function approximation , 2008, NIPS 2008.
[176] Paul J. Werbos,et al. Neural networks for control and system identification , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[177] R. Rubinstein. The Cross-Entropy Method for Combinatorial and Continuous Optimization , 1999 .
[178] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[179] Csaba Szepesvári,et al. Fitted Q-iteration in continuous action-space MDPs , 2007, NIPS.
[180] Frank L. Lewis,et al. Adaptive optimal control for continuous-time linear systems based on policy iteration , 2009, Autom..
[181] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[182] Leemon C Baird,et al. Reinforcement Learning With High-Dimensional, Continuous Actions , 1993 .
[183] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..
[184] Hyongsuk Kim,et al. CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.
[185] Zbigniew Michalewicz,et al. Evolutionary Computation 1 , 2018 .
[186] Shuqing Zeng,et al. Learning and tuning fuzzy logic controllers through genetic algorithm , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).
[187] H. R. Berenji,et al. Fuzzy Q-learning: a new approach for fuzzy dynamic programming , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.
[188] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[189] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[190] Eduardo Alonso. Multi-agent learning , 2007, Autonomous Agents and Multi-Agent Systems.
[191] Marco Wiering,et al. Using continuous action spaces to solve discrete problems , 2009, 2009 International Joint Conference on Neural Networks.
[192] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[193] Csaba Szepesvári,et al. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path , 2006, Machine Learning.
[194] Alborz Geramifard,et al. Incremental Least-Squares Temporal Difference Learning , 2006, AAAI.