Problem solving with reinforcement learning
暂无分享,去创建一个
[1] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[2] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[3] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[4] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .
[5] S. Zucker,et al. Toward Efficient Trajectory Planning: The Path-Velocity Decomposition , 1986 .
[6] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .
[7] Marcel Schoppers,et al. Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.
[8] Ken-ichi Funahashi,et al. On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.
[9] Kumpati S. Narendra,et al. Learning automata - an introduction , 1989 .
[10] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[11] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[12] Michael I. Jordan,et al. Learning to Control an Unstable System with Forward Modeling , 1989, NIPS.
[13] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[14] Kurt Hornik,et al. Multilayer feedforward networks are universal approximators , 1989, Neural Networks.
[15] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[16] Oussama Khatib,et al. Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.
[17] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[18] John C. Platt. A Resource-Allocating Network for Function Interpolation , 1991, Neural Computation.
[19] John E. W. Mayhew,et al. Obstacle Avoidance through Reinforcement Learning , 1991, NIPS.
[20] Michael I. Jordan,et al. Hierarchies of Adaptive Experts , 1991, NIPS.
[21] Carlos D. Brody,et al. Fast Learning with Predictive Forward Models , 1991, NIPS.
[22] Myung Won Kim,et al. An efficient hidden node reduction technique for multilayer perceptrons , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[23] Jean-Claude Latombe,et al. Robot Motion Planning: A Distributed Representation Approach , 1991, Int. J. Robotics Res..
[24] Sebastian Thrun,et al. Active Exploration in Dynamic Environments , 1991, NIPS.
[25] Qiuming Zhu,et al. Hidden Markov model for dynamic obstacle avoidance of mobile robot navigation , 1991, IEEE Trans. Robotics Autom..
[26] Michael I. Jordan,et al. Task Decomposition Through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks , 1990, Cogn. Sci..
[27] S. Thrun. Eecient Exploration in Reinforcement Learning , 1992 .
[28] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[29] J. Millán,et al. A Reinforcement Connectionist Approach to Robot Path Finding in Non-Maze-Like Environments , 2004, Machine Learning.
[30] Steven J. Bradtke,et al. Reinforcement Learning Applied to Linear Quadratic Regulation , 1992, NIPS.
[31] Charles W. Anderson,et al. Q-Learning with Hidden-Unit Restarting , 1992, NIPS.
[32] Juan Carlos,et al. Multistrategy Learning in a Reactive Control Systems for Autonomous Robotic Navigation , 1993, Informatica.
[33] L.-J. Lin,et al. Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.
[34] Russell Reed,et al. Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.
[35] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[36] Martin Fodslette Møller,et al. A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.
[37] Wj Fitzgerald,et al. Optimization schemes for neural networks , 1993 .
[38] Ronald J. Williams,et al. Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .
[39] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[40] Ronald J. Williams,et al. Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-Cr , 1993 .
[41] Jing Peng,et al. Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..
[42] Long Ji Lin,et al. Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.
[43] Gregory J. Wolff,et al. Optimal Brain Surgeon and general network pruning , 1993, IEEE International Conference on Neural Networks.
[44] Tony J. Prescott,et al. Explorations in Reinforcement and Model-based Learning , 1994 .
[45] LearningRichard S. Suttonsutton. On Step-Size and Bias in Temporal-Di erence , 1994 .
[46] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[47] Sridhar Mahadevan,et al. To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.
[48] Chen-Khong Tham,et al. Modular on-line function approximation for scaling up reinforcement learning , 1994 .
[49] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[50] B Ravindran,et al. A tutorial survey of reinforcement learning , 1994 .
[51] Stewart W. Wilson. ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.
[52] V. Gullapalli,et al. Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.
[53] Martin A. Riedmiller,et al. Advanced supervised learning in multi-layer perceptrons — From backpropagation to adaptive learning algorithms , 1994 .
[54] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[55] Pawea Cichosz. Truncating Temporal Diierences: on the Eecient Implementation of Td() for Reinforcement Learning , 1995 .
[56] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[57] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.
[58] Leslie Pack Kaelbling,et al. On reinforcement learning for robots , 1996, IROS.
[59] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.