论文信息 - Conclusions, Future Directions and Outlook

Conclusions, Future Directions and Outlook

This book has provided the reader with a thorough description of the field of reinforcement learning (RL). In this last chapter we will first discuss what has been accomplished with this book, followed by a description of those topics that were left out of this book, mainly because they are outside of the main field of RL or they are small (possibly novel and emerging) subfields within RL. After looking back what has been done in RL and in this book, a step into the future development of the field will be taken, and we will end with the opinions of some of the authors what they think will become the most important areas of research in RL.

Marco Wiering | Martijn van Otterlo | M. Wiering | M. V. Otterlo

[1] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[2] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.

[3] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..

[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[5] J. Westra,et al. Organizing adaptation using agents in serious games , 2011 .

[6] Marco Wiering. Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning , 2010, J. Intell. Learn. Syst. Appl..

[7] Bram Bakker,et al. Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization , 2003 .

[8] Marco Dorigo,et al. Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[9] Luca Maria Gambardella,et al. Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[10] J. Eliot B. Moss,et al. Scheduling Straight-Line Code Using Reinforcement Learning and Rollouts , 1998, NIPS.

[11] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[12] Jürgen Schmidhuber,et al. Optimal Ordered Problem Solver , 2002, Machine Learning.

[13] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.

[14] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[15] Marcus Hutter,et al. Universal Learning of Repeated Matrix Games , 2005, ArXiv.

[16] Ming Li,et al. Kolmogorov Complexity and its Applications , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[17] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.

[18] Frank Dignum,et al. Organizing Scalable Adaptation in Serious Games , 2011, AEGS.

[19] Zongmin Ma,et al. Computers and Games , 2008, Lecture Notes in Computer Science.

[20] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.

[21] Rafal Salustowicz,et al. Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.

[22] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[23] Martin A. Riedmiller,et al. A Neural Reinforcement Learning Approach to Learn Local Dispatching Policies in Production Scheduling , 1999, IJCAI.

[24] Tao Xiong,et al. A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[25] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[26] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[27] Brian Tanner,et al. RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..

[28] Marco Wiering,et al. The QV family compared to other reinforcement learning algorithms , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[29] A. Kolmogorov. Three approaches to the quantitative definition of information , 1968 .

[30] Sebastian Thrun,et al. Learning to Learn , 1998, Springer US.

[31] Hans J. Berliner,et al. Experiences in Evaluation with BKG - A Program that Plays Backgammon , 1977, IJCAI.

[32] Jürgen Schmidhuber,et al. Ultimate Cognition à la Gödel , 2009, Cognitive Computation.

[33] Marco Dorigo,et al. An adaptive multi-agent routing algorithm inspired by ants behavior , 1998 .

[34] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.

[35] Xin Yao,et al. Evolutionary computation : theory and applications , 1999 .

[36] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[37] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .

[38] Leslie Pack Kaelbling,et al. Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[39] Ray J. Solomonoff,et al. Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.

[40] Jürgen Schmidhuber,et al. Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.

[41] Joel Veness,et al. Bootstrapping from Game Tree Search , 2009, NIPS.

[42] Risto Miikkulainen,et al. Efficient Reinforcement Learning through Symbiotic Evolution , 2004 .

[43] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[44] Dr. Marcus Hutter,et al. Universal artificial intelligence , 2004 .

[45] Jürgen Schmidhuber,et al. The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions , 2002, COLT.

[46] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[47] Joel Veness,et al. A Monte-Carlo AIXI Approximation , 2009, J. Artif. Intell. Res..

[48] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.

[49] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[50] Jürgen Schmidhuber,et al. Solving POMDPs with Levin Search and EIRA , 1996, ICML.

[51] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.

[52] S. Singh,et al. Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[53] M. van Otterlo. Efficient Reinforcement Learning using Relational Aggregation , 2003 .

[54] Zbigniew Michalewicz,et al. Evolutionary Computation 1 , 2018 .

[55] Nichael Lynn Cramer,et al. A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.

[56] Michael Kearns,et al. Reinforcement learning for optimized trade execution , 2006, ICML.

[57] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[58] Michael L. Littman,et al. Dimension reduction and its application to model-based exploration in continuous spaces , 2010, Machine Learning.

[59] John R. Koza,et al. Genetic evolution and co-evolution of computer programs , 1991 .

[60] J. Van Leeuwen,et al. Handbook of theoretical computer science - Part A: Algorithms and complexity; Part B: Formal models and semantics , 1990 .

[61] Michael L. Littman,et al. A Distributed Reinforcement Learning Scheme for Network Routing , 1993 .

[62] M Dorigo,et al. Ant colonies for the quadratic assignment problem , 1999, J. Oper. Res. Soc..

[63] Ethem Alpaydin,et al. Introduction to machine learning , 2004, Adaptive computation and machine learning.

[64] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[65] Jürgen Schmidhuber,et al. Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.

[66] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[67] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[68] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[69] SolomonoffR.. Complexity-based induction systems , 2006 .

[70] Michael Mateas,et al. Towards adaptive programming: integrating reinforcement learning into a programming language , 2008, OOPSLA.

[71] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .

[72] John R. Koza,et al. Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.

[73] Pieter Abbeel,et al. Apprenticeship learning for helicopter control , 2009, CACM.

[74] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.

[75] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.

[76] Shimon Whiteson,et al. Exploiting Best-Match Equations for Efficient Reinforcement Learning , 2011, J. Mach. Learn. Res..

[77] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.

[78] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[79] Marco Wiering,et al. Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .

[80] Marco Wiering,et al. Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[81] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.