A Bibliography of Work Related to Reinforcement Learning
暂无分享,去创建一个
[1] Andrew W. Moore,et al. Variable Resolution Reinforcement Learning. , 1995 .
[2] Richard S. Sutton,et al. The Truck Backer-Upper: An Example of Self-Learning in Neural Networks , 1995 .
[3] Richard S. Sutton,et al. A Menu of Designs for Reinforcement Learning Over Time , 1995 .
[4] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[5] Marco Colombetti,et al. Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..
[6] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[7] Marco Dorigo,et al. A comparison of Q-learning and classifier systems , 1994 .
[8] W. Estes. Toward a Statistical Theory of Learning. , 1994 .
[9] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[10] Volker Tresp,et al. A Trivial but Fast Reinforcement Controller , 1994 .
[11] Roderic A. Grupen,et al. Robust Reinforcement Learning in Motion Planning , 1993, NIPS.
[12] Mark Ring. Two methods for hierarchy learning in reinforcement environments , 1993 .
[13] Jürgen Schmidhuber,et al. Planning simple trajectories using neural subgoal generators , 1993 .
[14] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[15] Richard S. Sutton,et al. Online Learning with Random Representations , 1993, ICML.
[16] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[17] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[18] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[19] Sebastian Thrun,et al. Exploration and model building in mobile robot domains , 1993, IEEE International Conference on Neural Networks.
[20] Gary McGraw,et al. Emergent Control and Planning in an Autonomous Vehicle , 1993 .
[21] Ronald J. Williams,et al. Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .
[22] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[23] Eduardo D. Sontag,et al. Neural Networks for Control , 1993 .
[24] Eduardo D. Sontag,et al. Some Topics in Neural Networks and Control , 1993 .
[25] Andrew W. Moore,et al. Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.
[26] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.
[27] Sebastian Thrun,et al. Explanation-Based Neural Network Learning for Robot Control , 1992, NIPS.
[28] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.
[29] Satinder P. Singh,et al. Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.
[30] Judy A. Franklin,et al. Learning channel allocation strategies in real time , 1992, [1992 Proceedings] Vehicular Technology Society 42nd VTS Conference - Frontiers of Technology.
[31] D. Sofge. THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .
[32] S. Thrun. Eecient Exploration in Reinforcement Learning , 1992 .
[33] Andrew W. Moore,et al. Fast, Robust Adaptive Control by Learning only Forward Models , 1991, NIPS.
[34] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[35] Jürgen Schmidhuber,et al. Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[36] Steven D. Whitehead,et al. A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.
[37] Ming Tan,et al. Cost-Sensitive Reinforcement Learning for Adaptive Classification and Control , 1991, AAAI.
[38] Lambert E. Wixson,et al. Scaling Reinforcement Learning Techniques via Modularity , 1991, ML.
[39] Steven D. Whitehead,et al. Complexity and Cooperation in Q-Learning , 1991, ML.
[40] Richard S. Sutton,et al. Reinforcement learning architectures for animats , 1991 .
[41] Jürgen Schmidhuber,et al. A possibility for implementing curiosity and boredom in model-building neural controllers , 1991 .
[42] Hans J. Bremermann,et al. How the Brain Adjusts Synapses - Maybe , 1991, Automated Reasoning: Essays in Honor of Woody Bledsoe.
[43] Jürgen Schmidhuber,et al. Learning to Generate Artificial Fovea Trajectories for Target Detection , 1991, Int. J. Neural Syst..
[44] Richard S. Sutton,et al. Planning by Incremental Dynamic Programming , 1991, ML.
[45] Dana H. Ballard,et al. Active Perception and Reinforcement Learning , 1990, Neural Computation.
[46] Jacques J. Vidal,et al. Adaptive Range Coding , 1990, NIPS.
[47] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.
[48] Richard S. Sutton,et al. Advances in reinforcement learning and their implications for intelligent control , 1990, Proceedings. 5th IEEE International Symposium on Intelligent Control 1990.
[49] Andrew W. Moore,et al. Acquisition of Dynamic Control Knowledge for a Robotic Manipulator , 1990, ML.
[50] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[51] Ronald J. Williams,et al. Adaptive state representation and estimation using recurrent connectionist networks , 1990 .
[52] Richard S. Sutton,et al. Time-Derivative Models of Pavlovian Reinforcement , 1990 .
[53] Jürgen Schmidhuber,et al. Recurrent networks adjusted by adaptive critics , 1990 .
[54] Andrew G. Barto,et al. Connectionist learning for control , 1990 .
[55] Judy A. Franklin,et al. Historical perspective and state of the art in connectionist learning control , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.
[56] D. Ballard,et al. A Role for Anticipation in Reactive Systems that Learn , 1989, ML.
[57] W. Thomas Miller,et al. Real-time application of neural networks for sensor-based control of robots with vision , 1989, IEEE Trans. Syst. Man Cybern..
[58] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[59] C.W. Anderson,et al. Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.
[60] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[61] David H. Ackley,et al. Generalization and Scaling in Reinforcement Learning , 1989, NIPS.
[62] Wei-Min Shen,et al. Learning from the environment based on percepts and actions , 1989 .
[63] Jürgen Schmidhuber,et al. A Local Learning Algorithm for Dynamic Feedforward and Recurrent Networks , 1989 .
[64] R. J. Williams,et al. On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.
[65] Judy A. Franklin. Compliance and learning: control skills for a robot operating in an uncertain world , 1988 .
[66] PAUL J. WERBOS,et al. Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.
[67] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[68] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[69] Filson H. Glanz,et al. Application of a General Learning Algorithm to the Control of Robotic Manipulators , 1987 .
[70] W. Thomas Miller,et al. Sensor-based control of robotic manipulators using a general learning algorithm , 1987, IEEE J. Robotics Autom..
[71] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[72] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .
[73] S. Thomas Alexander,et al. Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.
[74] Richard S. Sutton,et al. Training and Tracking in Robotics , 1985, IJCAI.
[75] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[76] A G Barto,et al. Learning by statistical cooperation of self-interested neuron-like computing elements. , 1985, Human neurobiology.
[77] M. A. L. THATHACHAR,et al. A new approach to the design of reinforcement schemes for learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[78] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[79] Hendrik Van Brussel,et al. A self-learning automaton with variable resolution for high precision assembly by industrial robots , 1982 .
[80] A G Barto,et al. Toward a modern theory of adaptive networks: expectation and prediction. , 1981, Psychological review.
[81] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
[82] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[83] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..
[84] Bernard Widrow,et al. Punish/Reward: Learning with a Critic in Adaptive Threshold Systems , 1973, IEEE Trans. Syst. Man Cybern..
[85] M. L. Tsetlin,et al. Automaton theory and modeling of biological systems , 1973 .
[86] R. Rescorla,et al. A theory of Pavlovian conditioning : Variations in the effectiveness of reinforcement and nonreinforcement , 1972 .
[87] A. S. Harding. Markovian decision processes , 1970 .
[88] Wilm E. Donath,et al. Hardware implementation , 1968, AFIPS '68 (Fall, part II).
[89] J. Laurie Snell,et al. Studies in mathematical learning theory. , 1960 .
[90] R. Howard. Dynamic Programming and Markov Processes , 1960 .
[91] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..