Making Reinforcement Learning Work on Real Robots
暂无分享,去创建一个
[1] R. Bellman. Dynamic programming. , 1957, Science.
[2] N. Draper,et al. Applied Regression Analysis , 1966 .
[3] Richard O. Duda,et al. Use of the Hough transformation to detect lines and curves in pictures , 1972, CACM.
[4] James S. Albus,et al. I A New Approach to Manipulator Control: The I Cerebellar Model Articulation Controller , 1975 .
[5] James S. Albus,et al. Data Storage in the Cerebellar Model Articulation Controller (CMAC) , 1975 .
[6] James S. Albus,et al. New Approach to Manipulator Control: The Cerebellar Model Articulation Controller (CMAC)1 , 1975 .
[7] Jon Louis Bentley,et al. An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.
[8] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[9] Bruce G. Batchelor,et al. Pattern Recognition: Ideas in Practice , 1978 .
[10] R. Cook. Influential Observations in Linear Regression , 1979 .
[11] W. W. Muir,et al. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .
[12] Jon Louis Bentley,et al. Multidimensional divide-and-conquer , 1980, CACM.
[13] James S. Albus,et al. Brains, behavior, and robotics , 1981 .
[14] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[15] Antonin Guttman,et al. R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.
[16] James L. McClelland,et al. Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .
[17] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[18] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).
[19] David Chapman,et al. Pengi: An Implementation of a Theory of Activity , 1987, AAAI.
[20] David E. Goldberg,et al. Genetic Algorithms in Search Optimization and Machine Learning , 1988 .
[21] C. Watkins. Learning from delayed rewards , 1989 .
[22] Michael McCloskey,et al. Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .
[23] Rodney A. Brooks,et al. Learning to Coordinate Behaviors , 1990, AAAI.
[24] R Ratcliff,et al. Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.
[25] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[26] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .
[27] A. Moore. Variable Resolution Dynamic Programming , 1991, ML.
[28] Belur V. Dasarathy,et al. Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .
[29] J. Friedman. Multivariate adaptive regression splines , 1990 .
[30] Thomas G. Dietterich,et al. Learning with Many Irrelevant Features , 1991, AAAI.
[31] Steven D. Whitehead,et al. A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.
[32] Paul E. Utgoff,et al. Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.
[33] Gerald Tesauro,et al. Practical Issues in Temporal Difference Learning , 1992, Mach. Learn..
[34] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[35] Larry A. Rendell,et al. The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.
[36] Paul E. Utgoff,et al. A Teaching Method for Reinforcement Learning , 1992, ML.
[37] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[38] Dean A. Pomerleau,et al. Neural Network Perception for Mobile Robot Guidance , 1993 .
[39] Tom M. Mitchell,et al. An Apprentice-Based Approach to Knowledge Acquisition , 1993, Artif. Intell..
[40] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[41] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[42] R. Palmer,et al. Introduction to the theory of neural computation , 1994, The advanced book program.
[43] Marco Colombetti,et al. Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..
[44] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[45] Masayuki Inaba,et al. Learning by watching: extracting reusable task knowledge from visual observation of human performance , 1994, IEEE Trans. Robotics Autom..
[46] Stefan Schaal,et al. Robot learning by nonparametric regression , 1994, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'94).
[47] Stefan Schaal,et al. Memory-based robot learning , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.
[48] Benjamin Van Roy,et al. Feature-based methods for large scale dynamic programming , 1995 .
[49] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[50] Stefan Schaal,et al. From Isolation to Cooperation: An Alternative View of a System of Experts , 1995, NIPS.
[51] Thomas G. Dietterich,et al. High-Performance Job-Shop Scheduling With A Time-Delay TD(λ) Network , 1995, NIPS 1995.
[52] Andrew G. Barto,et al. Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.
[53] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[54] Richard S. Sutton,et al. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.
[55] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[56] Hobart R. Everett,et al. Sensors for Mobile Robots: Theory and Application , 1995 .
[57] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[58] Andrew W. Moore,et al. Learning Evaluation Functions for Large Acyclic Domains , 1996, ICML.
[59] Timothy M. Chan. Output-sensitive results on convex hulls, extreme points, and related problems , 1996, Discret. Comput. Geom..
[60] Daphne Koller,et al. Toward Optimal Feature Selection , 1996, ICML.
[61] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.
[62] Wei Zhang,et al. Reinforcement learning for job shop scheduling , 1996 .
[63] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[64] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[65] D. Randall Wilson,et al. Advances in instance-based learning algorithms , 1997 .
[66] Stefan Schaal,et al. Robot Learning From Demonstration , 1997, ICML.
[67] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..
[68] Michael Kaiser,et al. Transfer of Elementary Skills via Human-Robot Interaction , 1997, Adapt. Behav..
[69] Maja J. Mataric,et al. Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.
[70] Andrew W. Moore,et al. Efficient Locally Weighted Polynomial Regression Predictions , 1997, ICML.
[71] Doina Precup,et al. Exponentiated Gradient Methods for Reinforcement Learning , 1997, ICML.
[72] Jude W. Shavlik,et al. Creating advice-taking reinforcement learners , 1998 .
[73] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[74] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[75] Andrew W. Moore,et al. A Nonparametric Approach to Noisy and Costly Optimization , 2000, ICML.
[76] Michael R. M. Jenkin,et al. Computational principles of mobile robotics , 2000 .
[77] W. Smart,et al. Practical Reinforcement Learning , 2000, ICML 2000.
[78] Martin C. Martin,et al. Visual obstacle avoidance using genetic programming: first results , 2001 .
[79] Minoru Asada,et al. Purposive behavior acquisition for a real robot by vision-based reinforcement learning , 1995, Machine Learning.
[80] Peter Dayan,et al. The convergence of TD(λ) for general λ , 1992, Machine Learning.
[81] Andrew W. Moore,et al. The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces , 1993, Machine Learning.
[82] Andrew W. Moore,et al. Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.
[83] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[84] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.
[85] John N. Tsitsiklis,et al. Asynchronous Stochastic Approximation and Q-Learning , 1994, Machine Learning.
[86] Andrew W. Moore,et al. Locally Weighted Learning , 1997, Artificial Intelligence Review.
[87] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.
[88] R. Simmons,et al. The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms , 2004, Machine Learning.
[89] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[90] Rémi Munos,et al. A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions , 2000, Machine Learning.
[91] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[92] Jing Peng,et al. Incremental multi-step Q-learning , 1994, Machine Learning.