Conclusions, Future Directions and Outlook
暂无分享,去创建一个
[1] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.
[2] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.
[3] Ray J. Solomonoff,et al. A Formal Theory of Inductive Inference. Part II , 1964, Inf. Control..
[4] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[5] J. Westra,et al. Organizing adaptation using agents in serious games , 2011 .
[6] Marco Wiering. Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning , 2010, J. Intell. Learn. Syst. Appl..
[7] Bram Bakker,et al. Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization , 2003 .
[8] Marco Dorigo,et al. Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.
[9] Luca Maria Gambardella,et al. Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..
[10] J. Eliot B. Moss,et al. Scheduling Straight-Line Code Using Reinforcement Learning and Rollouts , 1998, NIPS.
[11] Jan Peters,et al. Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .
[12] Jürgen Schmidhuber,et al. Optimal Ordered Problem Solver , 2002, Machine Learning.
[13] Ralf Schoknecht,et al. Optimality of Reinforcement Learning Algorithms with Linear Function Approximation , 2002, NIPS.
[14] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[15] Marcus Hutter,et al. Universal Learning of Repeated Matrix Games , 2005, ArXiv.
[16] Ming Li,et al. Kolmogorov Complexity and its Applications , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.
[17] Hado van Hasselt,et al. Double Q-learning , 2010, NIPS.
[18] Frank Dignum,et al. Organizing Scalable Adaptation in Serious Games , 2011, AEGS.
[19] Zongmin Ma,et al. Computers and Games , 2008, Lecture Notes in Computer Science.
[20] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[21] Rafal Salustowicz,et al. Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.
[22] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[23] Martin A. Riedmiller,et al. A Neural Reinforcement Learning Approach to Learn Local Dispatching Policies in Production Scheduling , 1999, IJCAI.
[24] Tao Xiong,et al. A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..
[25] M.A. Wiering,et al. Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[26] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[27] Brian Tanner,et al. RL-Glue: Language-Independent Software for Reinforcement-Learning Experiments , 2009, J. Mach. Learn. Res..
[28] Marco Wiering,et al. The QV family compared to other reinforcement learning algorithms , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[29] A. Kolmogorov. Three approaches to the quantitative definition of information , 1968 .
[30] Sebastian Thrun,et al. Learning to Learn , 1998, Springer US.
[31] Hans J. Berliner,et al. Experiences in Evaluation with BKG - A Program that Plays Backgammon , 1977, IJCAI.
[32] Jürgen Schmidhuber,et al. Ultimate Cognition à la Gödel , 2009, Cognitive Computation.
[33] Marco Dorigo,et al. An adaptive multi-agent routing algorithm inspired by ants behavior , 1998 .
[34] Rémi Coulom,et al. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search , 2006, Computers and Games.
[35] Xin Yao,et al. Evolutionary computation : theory and applications , 1999 .
[36] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[37] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .
[38] Leslie Pack Kaelbling,et al. Effective reinforcement learning for mobile robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).
[39] Ray J. Solomonoff,et al. Complexity-based induction systems: Comparisons and convergence theorems , 1978, IEEE Trans. Inf. Theory.
[40] Jürgen Schmidhuber,et al. Reinforcement Learning with Self-Modifying Policies , 1998, Learning to Learn.
[41] Joel Veness,et al. Bootstrapping from Game Tree Search , 2009, NIPS.
[42] Risto Miikkulainen,et al. Efficient Reinforcement Learning through Symbiotic Evolution , 2004 .
[43] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.
[44] Dr. Marcus Hutter,et al. Universal artificial intelligence , 2004 .
[45] Jürgen Schmidhuber,et al. The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions , 2002, COLT.
[46] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[47] Joel Veness,et al. A Monte-Carlo AIXI Approximation , 2009, J. Artif. Intell. Res..
[48] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.
[49] Pierre Geurts,et al. Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..
[50] Jürgen Schmidhuber,et al. Solving POMDPs with Levin Search and EIRA , 1996, ICML.
[51] Hilbert J. Kappen,et al. Speedy Q-Learning , 2011, NIPS.
[52] S. Singh,et al. Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..
[53] M. van Otterlo. Efficient Reinforcement Learning using Relational Aggregation , 2003 .
[54] Zbigniew Michalewicz,et al. Evolutionary Computation 1 , 2018 .
[55] Nichael Lynn Cramer,et al. A Representation for the Adaptive Generation of Simple Sequential Programs , 1985, ICGA.
[56] Michael Kearns,et al. Reinforcement learning for optimized trade execution , 2006, ICML.
[57] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[58] Michael L. Littman,et al. Dimension reduction and its application to model-based exploration in continuous spaces , 2010, Machine Learning.
[59] John R. Koza,et al. Genetic evolution and co-evolution of computer programs , 1991 .
[60] J. Van Leeuwen,et al. Handbook of theoretical computer science - Part A: Algorithms and complexity; Part B: Formal models and semantics , 1990 .
[61] Michael L. Littman,et al. A Distributed Reinforcement Learning Scheme for Network Routing , 1993 .
[62] M Dorigo,et al. Ant colonies for the quadratic assignment problem , 1999, J. Oper. Res. Soc..
[63] Ethem Alpaydin,et al. Introduction to machine learning , 2004, Adaptive computation and machine learning.
[64] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[65] Jürgen Schmidhuber,et al. Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement , 1997, Machine Learning.
[66] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[67] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.
[68] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[69] SolomonoffR.. Complexity-based induction systems , 2006 .
[70] Michael Mateas,et al. Towards adaptive programming: integrating reinforcement learning into a programming language , 2008, OOPSLA.
[71] Yasemin Altun,et al. Relative Entropy Policy Search , 2010 .
[72] John R. Koza,et al. Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.
[73] Pieter Abbeel,et al. Apprenticeship learning for helicopter control , 2009, CACM.
[74] Nasser M. Nasrabadi,et al. Pattern Recognition and Machine Learning , 2006, Technometrics.
[75] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[76] Shimon Whiteson,et al. Exploiting Best-Match Equations for Efficient Reinforcement Learning , 2011, J. Mach. Learn. Res..
[77] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[78] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[79] Marco Wiering,et al. Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .
[80] Marco Wiering,et al. Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).
[81] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.