A Bibliography of Work Related to Reinforcement Learning

[1]  Andrew W. Moore,et al.  Variable Resolution Reinforcement Learning. , 1995 .

[2]  Richard S. Sutton,et al.  The Truck Backer-Upper: An Example of Self-Learning in Neural Networks , 1995 .

[3]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[4]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[5]  Marco Colombetti,et al.  Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[6]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[7]  Marco Dorigo,et al.  A comparison of Q-learning and classifier systems , 1994 .

[8]  W. Estes Toward a Statistical Theory of Learning. , 1994 .

[9]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[10]  Volker Tresp,et al.  A Trivial but Fast Reinforcement Controller , 1994 .

[11]  Roderic A. Grupen,et al.  Robust Reinforcement Learning in Motion Planning , 1993, NIPS.

[12]  Mark Ring Two methods for hierarchy learning in reinforcement environments , 1993 .

[13]  Jürgen Schmidhuber,et al.  Planning simple trajectories using neural subgoal generators , 1993 .

[14]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[15]  Richard S. Sutton,et al.  Online Learning with Random Representations , 1993, ICML.

[16]  Andrew McCallum,et al.  Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[17]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[18]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[19]  Sebastian Thrun,et al.  Exploration and model building in mobile robot domains , 1993, IEEE International Conference on Neural Networks.

[20]  Gary McGraw,et al.  Emergent Control and Planning in an Autonomous Vehicle , 1993 .

[21]  Ronald J. Williams,et al.  Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .

[22]  Ronald J. Williams,et al.  Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[23]  Eduardo D. Sontag,et al.  Neural Networks for Control , 1993 .

[24]  Eduardo D. Sontag,et al.  Some Topics in Neural Networks and Control , 1993 .

[25]  Andrew W. Moore,et al.  Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.

[26]  Steven J. Bradtke,et al.  Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.

[27]  Sebastian Thrun,et al.  Explanation-Based Neural Network Learning for Robot Control , 1992, NIPS.

[28]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[29]  Satinder P. Singh,et al.  Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.

[30]  Judy A. Franklin,et al.  Learning channel allocation strategies in real time , 1992, [1992 Proceedings] Vehicular Technology Society 42nd VTS Conference - Frontiers of Technology.

[31]  D. Sofge THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .

[32]  S. Thrun Eecient Exploration in Reinforcement Learning , 1992 .

[33]  Andrew W. Moore,et al.  Fast, Robust Adaptive Control by Learning only Forward Models , 1991, NIPS.

[34]  Sebastian Thrun,et al.  Active Exploration in Dynamic Environments , 1991, NIPS.

[35]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[36]  Steven D. Whitehead,et al.  A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[37]  Ming Tan,et al.  Cost-Sensitive Reinforcement Learning for Adaptive Classification and Control , 1991, AAAI.

[38]  Lambert E. Wixson,et al.  Scaling Reinforcement Learning Techniques via Modularity , 1991, ML.

[39]  Steven D. Whitehead,et al.  Complexity and Cooperation in Q-Learning , 1991, ML.

[40]  Richard S. Sutton,et al.  Reinforcement learning architectures for animats , 1991 .

[41]  Jürgen Schmidhuber,et al.  A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .

[42]  Hans J. Bremermann,et al.  How the Brain Adjusts Synapses - Maybe , 1991, Automated Reasoning: Essays in Honor of Woody Bledsoe.

[43]  Jürgen Schmidhuber,et al.  Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..

[44]  Richard S. Sutton,et al.  Planning by Incremental Dynamic Programming , 1991, ML.

[45]  Dana H. Ballard,et al.  Active Perception and Reinforcement Learning , 1990, Neural Computation.

[46]  Jacques J. Vidal,et al.  Adaptive Range Coding , 1990, NIPS.

[47]  Jürgen Schmidhuber,et al.  Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.

[48]  Richard S. Sutton,et al.  Advances in reinforcement learning and their implications for intelligent control , 1990, Proceedings. 5th IEEE International Symposium on Intelligent Control 1990.

[49]  Andrew W. Moore,et al.  Acquisition of Dynamic Control Knowledge for a Robotic Manipulator , 1990, ML.

[50]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[51]  Ronald J. Williams,et al.  Adaptive state representation and estimation using recurrent connectionist networks , 1990 .

[52]  Richard S. Sutton,et al.  Time-Derivative Models of Pavlovian Reinforcement , 1990 .

[53]  Jürgen Schmidhuber,et al.  Recurrent networks adjusted by adaptive critics , 1990 .

[54]  Andrew G. Barto,et al.  Connectionist learning for control , 1990 .

[55]  Judy A. Franklin,et al.  Historical perspective and state of the art in connectionist learning control , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[56]  D. Ballard,et al.  A Role for Anticipation in Reactive Systems that Learn , 1989, ML.

[57]  W. Thomas Miller,et al.  Real-time application of neural networks for sensor-based control of robots with vision , 1989, IEEE Trans. Syst. Man Cybern..

[58]  Kumpati S. Narendra,et al.  Learning automata - an introduction , 1989 .

[59]  C.W. Anderson,et al.  Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[60]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[61]  David H. Ackley,et al.  Generalization and Scaling in Reinforcement Learning , 1989, NIPS.

[62]  Wei-Min Shen,et al.  Learning from the environment based on percepts and actions , 1989 .

[63]  Jürgen Schmidhuber,et al.  A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .

[64]  R. J. Williams,et al.  On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.

[65]  Judy A. Franklin Compliance and learning: control skills for a robot operating in an uncertain world , 1988 .

[66]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[67]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[68]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[69]  Filson H. Glanz,et al.  Application of a General Learning Algorithm to the Control of Robotic Manipulators , 1987 .

[70]  W. Thomas Miller,et al.  Sensor-based control of robotic manipulators using a general learning algorithm , 1987, IEEE J. Robotics Autom..

[71]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[72]  Charles W. Anderson,et al.  Strategy Learning with Multilayer Connectionist Representations , 1987 .

[73]  S. Thomas Alexander,et al.  Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[74]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[75]  P. Anandan,et al.  Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[76]  A G Barto,et al.  Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.

[77]  M. A. L. THATHACHAR,et al.  A new approach to the design of reinforcement schemes for learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[78]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[79]  Hendrik Van Brussel,et al.  A self-learning automaton with variable resolution for high precision assembly by industrial robots , 1982 .

[80]  A G Barto,et al.  Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.

[81]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[82]  Ian H. Witten,et al.  An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..

[83]  Kumpati S. Narendra,et al.  Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[84]  Bernard Widrow,et al.  Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..

[85]  M. L. Tsetlin,et al.  Automaton theory and modeling of biological systems , 1973 .

[86]  R. Rescorla,et al.  A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .

[87]  A. S. Harding Markovian decision processes , 1970 .

[88]  Wilm E. Donath,et al.  Hardware implementation , 1968, AFIPS '68 (Fall, part II).

[89]  J. Laurie Snell,et al.  Studies in mathematical learning theory. , 1960 .

[90]  R. Howard Dynamic Programming and Markov Processes , 1960 .

[91]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..