Modular on-line function approximation for scaling up reinforcement learning
暂无分享,去创建一个
[1] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..
[2] Chris Chatfield,et al. Statistics for Technology (A Course in Applied Statistics) , 1984 .
[3] James S. Albus,et al. Data Storage in the Cerebellar Model Articulation Controller (CMAC) , 1975 .
[4] Jon Louis Bentley,et al. Multidimensional binary search trees used for associative searching , 1975, CACM.
[5] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[6] John M. Hollerbach,et al. A Recursive Lagrangian Formulation of Manipulator Dynamics , 1980 .
[7] John M. Hollerbach,et al. A Recursive Lagrangian Formulation of Maniputator Dynamics and a Comparative Study of Dynamics Formulation Complexity , 1980, IEEE Transactions on Systems, Man, and Cybernetics.
[8] Tomás Lozano-Pérez,et al. Automatic Planning of Manipulator Transfer Movements , 1981, IEEE Transactions on Systems, Man, and Cybernetics.
[9] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[10] P. Anandan,et al. Pattern-recognizing stochastic learning automata , 1985, IEEE Transactions on Systems, Man, and Cybernetics.
[11] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[12] John H. Holland,et al. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .
[13] King-Sun Fu,et al. Learning Control Systems-Review and Outlook , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[14] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[15] Scott E. Fahlman,et al. An empirical study of learning speed in back-propagation networks , 1988 .
[16] John E. Moody,et al. Fast Learning in Multi-Resolution Hierarchies , 1988, NIPS.
[17] Kimon P. Valavanis,et al. Analytical design of intelligent machines , 1985, Autom..
[18] John Moody,et al. Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.
[19] John Scott Bridle,et al. Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.
[20] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[21] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[22] Rodney A. Brooks,et al. A robot that walks; emergent behaviors from a carefully evolved network , 1989, Proceedings, 1989 International Conference on Robotics and Automation.
[23] W. Thomas Miller,et al. Real-time application of neural networks for sensor-based control of robots with vision , 1989, IEEE Trans. Syst. Man Cybern..
[24] Pattie Maes,et al. Designing autonomous agents: Theory and practice from biology to engineering and back , 1990, Robotics Auton. Syst..
[25] Vijaykumar Gullapalli,et al. A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.
[26] Andrew W. Moore,et al. Efficient memory-based learning for robot control , 1990 .
[27] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[28] John C. Platt. A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.
[29] John E. W. Mayhew,et al. Obstacle Avoidance through Reinforcement Learning , 1991, NIPS.
[30] Michael I. Jordan,et al. Hierarchies of Adaptive Experts , 1991, NIPS.
[31] Steven J. Nowlan,et al. Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures , 1991 .
[32] D. Ellison,et al. On the Convergence of the Multidimensional Albus Perceptron , 1991, Int. J. Robotics Res..
[33] Hyongsuk Kim,et al. CMAC-based adaptive critic self-learning control , 1991, IEEE Trans. Neural Networks.
[34] Michael I. Jordan,et al. Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..
[35] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.
[36] S. Thrun. Eecient Exploration in Reinforcement Learning , 1992 .
[37] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[38] Vijaykumar Gullapalli,et al. Learning Control Under Extreme Uncertainty , 1992, NIPS.
[39] Andrew W. Moore,et al. Memory-Based Reinforcement Learning: Efficient Computation with Prioritized Sweeping , 1992, NIPS.
[40] Michael I. Jordan,et al. Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..
[41] Satinder Singh. The Ecient Learning of Multiple Task Sequences , 1992 .
[42] David H. Wolpert,et al. Stacked generalization , 1992, Neural Networks.
[43] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.
[44] Charles W. Anderson,et al. Q-Learning with Hidden-Unit Restarting , 1992, NIPS.
[45] Vijaykumar Gullapalli,et al. Reinforcement learning and its application to control , 1992 .
[46] David A. Cohn,et al. Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.
[47] L.-J. Lin,et al. Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.
[48] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[49] Visakan Kadirkamanathan,et al. A Function Estimation Approach to Sequential Learning with Neural Networks , 1993, Neural Computation.
[50] Roderic A. Grupen,et al. Robust Reinforcement Learning in Motion Planning , 1993, NIPS.
[51] Radford M. Neal. A new view of the EM algorithm that justifies incremental and other variants , 1993 .
[52] David J. Spiegelhalter,et al. Bayesian analysis in expert systems , 1993 .
[53] Steven J. Nowlan,et al. Mixtures of Controllers for Jump Linear and Non-Linear Plants , 1993, NIPS.
[54] Terrence J. Sejnowski,et al. Temporal Difference Learning of Position Evaluation in the Game of Go , 1993, NIPS.
[55] Sebastian Thrun,et al. Exploration and model building in mobile robot domains , 1993, IEEE International Conference on Neural Networks.
[56] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[57] Richard S. Sutton,et al. Online Learning with Random Representations , 1993, ICML.
[58] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[59] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[60] Ronald J. Williams,et al. Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .
[61] Satinder Singh,et al. Learning to Solve Markovian Decision Processes , 1993 .
[62] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[63] J. Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, IEEE International Conference on Neural Networks.
[64] Long Ji Lin,et al. Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.
[65] Leslie Pack Kaelbling,et al. Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.
[66] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.
[67] Tony J. Prescott,et al. Explorations in Reinforcement and Model-based Learning , 1994 .
[68] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[69] David A. Cohn,et al. Active Learning with Statistical Models , 1996, NIPS.
[70] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[71] Richard W. Prager,et al. A Modular Q-Learning Architecture for Manipulator Task Decomposition , 1994, ICML.
[72] A. Piper. Object-oriented divide-and-conquer for parallel processing , 1994 .
[73] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[74] Geoffrey E. Hinton,et al. An Alternative Model for Mixtures of Experts , 1994, NIPS.
[75] Steve R. Waterhouse,et al. Classification using hierarchical mixtures of experts , 1994, Proceedings of IEEE Workshop on Neural Networks for Signal Processing.
[76] Stewart W. Wilson. ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.
[77] V. Gullapalli,et al. Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.
[78] Mark Humphrys. W-learning: Competition among selfish Q-learners , 1995 .
[79] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[80] Long Ji Lin,et al. Reinforcement Learning of Non-Markov Decision Processes , 1995, Artif. Intell..
[81] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[82] B. Pasik-Duncan,et al. Adaptive Control , 1996, IEEE Control Systems.