Characterizing reinforcement learning methods through parameterized learning problems
暂无分享,去创建一个
[1] Shimon Whiteson,et al. Adaptive job routing and scheduling , 2004, Eng. Appl. Artif. Intell..
[2] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..
[3] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[4] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[5] Eric Bauer,et al. An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.
[6] L. Buşoniu. Evolutionary function approximation for reinforcement learning , 2006 .
[7] Peter Stone,et al. Function Approximation via Tile Coding: Automating Parameter Choice , 2005, SARA.
[8] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.
[9] H. Beyer. Evolutionary algorithms in noisy environments : theoretical issues and guidelines for practice , 2000 .
[10] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[11] Onur Mutlu,et al. Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , 2008, 2008 International Symposium on Computer Architecture.
[12] Adele E. Howe,et al. How evaluation guides AI research , 1988 .
[13] Michael Kearns,et al. Reinforcement learning for optimized trade execution , 2006, ICML.
[14] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[15] Michele Banko,et al. Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.
[16] Lihong Li,et al. The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning , 2009, ICML '09.
[17] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.
[18] Kevin Leyton-Brown,et al. SATzilla: Portfolio-based Algorithm Selection for SAT , 2008, J. Artif. Intell. Res..
[19] Daniel Kudenko,et al. Improving Optimistic Exploration in Model-Free Reinforcement Learning , 2009, ICANNGA.
[20] R. Bellman. Dynamic programming. , 1957, Science.
[21] Andrew W. Moore,et al. Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.
[22] Peter Stone,et al. Batch reinforcement learning in a complex domain , 2007, AAMAS '07.
[23] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[24] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[25] H. Sebastian Seung,et al. Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).
[26] Shalabh Bhatnagar,et al. Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.
[27] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[28] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[29] Michael L. Littman,et al. An optimization-based categorization of reinforcement learning environments , 1993 .
[30] James C. Spall,et al. Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.
[31] Jürgen Schmidhuber,et al. A robot that reinforcement-learns to identify and memorize important previous observations , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).
[32] Dana H. Ballard,et al. Learning to perceive and act by trial and error , 1991, Machine Learning.
[33] Rajarshi Das,et al. On the use of hybrid reinforcement learning for autonomic resource allocation , 2007, Cluster Computing.
[34] Pat Langley. Machine Learning as an Experimental Science , 2005, Machine Learning.
[35] Bart Selman,et al. Algorithm portfolios , 2001, Artif. Intell..
[36] Risto Miikkulainen,et al. Efficient evolution of neural networks through complexification , 2004 .
[37] Tom M. Mitchell,et al. Reinforcement learning with hidden states , 1993 .
[38] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..
[39] Sridhar Mahadevan,et al. Learning Representation and Control in Markov Decision Processes: New Frontiers , 2009, Found. Trends Mach. Learn..
[40] Michael R. James,et al. SarsaLandmark: an algorithm for learning in POMDPs with landmarks , 2009, AAMAS.
[41] Stefan Schaal,et al. 2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .
[42] G. Tesauro. Practical Issues in Temporal Difference Learning , 1992 .
[43] Sanjoy Dasgupta,et al. Off-Policy Temporal Difference Learning with Function Approximation , 2001, ICML.
[44] Christian Igel,et al. Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem , 2008, EWRL.
[45] Yngvi Björnsson,et al. Simulation-Based Approach to General Game Playing , 2008, AAAI.
[46] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[47] Yishay Mansour,et al. Convergence of Optimistic and Incremental Q-Learning , 2001, NIPS.
[48] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[49] Doina Precup,et al. Using MDP Characteristics to Guide Exploration in Reinforcement Learning , 2003, ECML.
[50] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.
[51] Martin A. Riedmiller,et al. A Case Study on Improving Defense Behavior in Soccer Simulation 2D: The NeuroHassle Approach , 2009, RoboCup.
[52] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[53] Shie Mannor,et al. A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..
[54] András Lörincz,et al. The many faces of optimism: a unifying approach , 2008, ICML '08.
[55] Risto Miikkulainen,et al. Solving Non-Markovian Control Tasks with Neuro-Evolution , 1999, IJCAI.
[56] Andrew McCallum,et al. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.
[57] Olivier Sigaud,et al. Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.
[58] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[59] Peter Stone,et al. Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.
[60] Philip N. Sabes. Approximating Q values with Basis Function Representations , 2004 .
[61] Doina Precup,et al. A Convergent Form of Approximate Policy Iteration , 2002, NIPS.
[62] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[63] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[64] Rich Caruana,et al. An empirical comparison of supervised learning algorithms , 2006, ICML.
[65] Ricardo Vilalta,et al. A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.
[66] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[67] Christian Igel,et al. Similarities and differences between policy gradient methods and evolution strategies , 2008, ESANN.
[68] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.
[69] Shimon Whiteson,et al. Protecting against evaluation overfitting in empirical reinforcement learning , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).
[70] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .
[71] Jay H. Lee,et al. Neuro-dynamic programming method for MPC 1 , 2001 .
[72] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[73] Peter Bock,et al. Using a Genetic Algorithm to Search for the Representational Bias of a Collective Reinforcement Learner , 1994, PPSN.
[74] Yoav Shoham,et al. Boosting as a Metaphor for Algorithm Design , 2003, CP.
[75] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.
[76] David H. Wolpert,et al. No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..
[77] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[78] Dieter Fox,et al. Reinforcement learning for sensing strategies , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).
[79] Longxin Lin. Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.
[80] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.
[81] Carla E. Brodley,et al. Recursive automatic bias selection for classifier construction , 1995, Machine Learning.
[82] Risto Miikkulainen,et al. Accelerated Neural Evolution through Cooperatively Coevolved Synapses , 2008, J. Mach. Learn. Res..
[83] Helen G. Cobb. Inductive Biases in a Reinforcement Learner , 1992 .
[84] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[85] Christian Igel,et al. Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search , 2009, ICML '09.
[86] P. Dayan,et al. TD(λ) converges with probability 1 , 2004, Machine Learning.
[87] Hilan Bensusan,et al. Meta-Learning by Landmarking Various Learning Algorithms , 2000, ICML.
[88] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[89] Julian Togelius,et al. Ontogenetic and Phylogenetic Reinforcement Learning , 2009, Künstliche Intell..
[90] Gavin Adrian Rummery. Problem solving with reinforcement learning , 1995 .
[91] Rich Caruana,et al. An empirical evaluation of supervised learning in high dimensions , 2008, ICML '08.
[92] Liming Xiang,et al. Kernel-Based Reinforcement Learning , 2006, ICIC.
[93] Shane Legg,et al. Temporal Difference Updating without a Learning Rate , 2007, NIPS.
[94] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[95] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[96] Michail G. Lagoudakis,et al. Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..
[97] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.
[98] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[99] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..
[100] Sean P. Meyn,et al. An analysis of reinforcement learning with function approximation , 2008, ICML '08.
[101] Paul R. Cohen,et al. How Evaluation Guides AI Research: The Message Still Counts More than the Medium , 1988, AI Mag..
[102] Peter Stone,et al. Reinforcement Learning for RoboCup Soccer Keepaway , 2005, Adapt. Behav..
[103] Marek Petrik,et al. Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes , 2010, ICML.
[104] Lihong Li,et al. PAC model-free reinforcement learning , 2006, ICML.
[105] Petros Koumoutsakos,et al. A Method for Handling Uncertainty in Evolutionary Optimization With an Application to Feedback Control of Combustion , 2009, IEEE Transactions on Evolutionary Computation.
[106] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[107] Richard S. Sutton,et al. Reinforcement Learning of Local Shape in the Game of Go , 2007, IJCAI.
[108] James S. Albus,et al. Brains, behavior, and robotics , 1981 .
[109] Christian Igel,et al. Efficient covariance matrix update for variable metric evolution strategies , 2009, Machine Learning.
[110] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[111] Shimon Whiteson,et al. Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning , 2010, Autonomous Agents and Multi-Agent Systems.
[112] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.
[113] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[114] Theodore J. Perkins,et al. On the Existence of Fixed Points for Q-Learning and Sarsa in Partially Observable Domains , 2002, ICML.
[115] András Lörincz,et al. Learning to Play Using Low-Complexity Rule-Based Policies: Illustrations through Ms. Pac-Man , 2007, J. Artif. Intell. Res..
[116] Michael L. Littman,et al. A theoretical analysis of Model-Based Interval Estimation , 2005, ICML.
[117] Chih-Han Yu,et al. Quadruped robot obstacle negotiation via reinforcement learning , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..
[118] Shalabh Bhatnagar,et al. Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.
[119] Andrew Y. Ng,et al. Regularization and feature selection in least-squares temporal difference learning , 2009, ICML '09.
[120] Frank Kirchner,et al. Analysis of an evolutionary reinforcement learning method in a multiagent domain , 2008, AAMAS.
[121] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[122] Scott Sanner,et al. Temporal Difference Bayesian Model Averaging: A Bayesian Perspective on Adapting Lambda , 2010, ICML.
[123] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.
[124] J. Ross Quinlan,et al. Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.
[125] Joelle Pineau,et al. Adaptive Treatment of Epilepsy via Batch-mode Reinforcement Learning , 2008, AAAI.
[126] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[127] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[128] Risto Miikkulainen,et al. Active Guidance for a Finless Rocket Using Neuroevolution , 2003, GECCO.