Integration of Partially Observable Markov Decision Processes and Reinforcement Learning for Simulat
暂无分享,去创建一个
[1] Carme Torras. Neural Learning for Robot Control , 1994, ECAI.
[2] R U Muller,et al. Head-direction cells recorded from the postsubiculum in freely moving rats. I. Description and quantitative analysis , 1990, The Journal of neuroscience : the official journal of the Society for Neuroscience.
[3] Michael A. Arbib,et al. Schema theory , 1998 .
[4] Marco Colombetti,et al. Robot Shaping: An Experiment in Behavior Engineering , 1997 .
[5] Charles W. Anderson,et al. Strategy Learning with Multilayer Connectionist Representations , 1987 .
[6] Leslie Pack Kaelbling,et al. The Synthesis of Digital Machines With Provable Epistemic Properties , 1986, TARK.
[7] H. Bastian. Sensation and Perception.—I , 1869, Nature.
[9] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[10] Richard S. Sutton,et al. Temporal credit assignment in reinforcement learning , 1984 .
[11] Jean-Arcady Meyer,et al. Hierarchical Map Building and Self-Positioning with MonaLysa , 1996, Adapt. Behav..
[12] Mark Ring. Two methods for hierarchy learning in reinforcement environments , 1993 .
[13] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.
[14] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[15] Sebastian Thrun,et al. Issues in Using Function Approximation for Reinforcement Learning , 1999 .
[16] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[17] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[18] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[19] Ian H. Witten,et al. An Adaptive Optimal Controller for Discrete-Time Markov Environments , 1977, Inf. Control..
[20] Mark Ring. Sequence Learning with Incremental Higher-Order Neural Networks , 1993 .
[21] Wenju Liu,et al. Planning in Stochastic Domains: Problem Characteristics and Approximation , 1996 .
[22] Satinder Singh. Soft Dynamic Programming Algorithms: Convergence Proofs Soft Dynamic Programming Algorithms: Convergence Proofs , 1993 .
[23] Blai Bonet. High-Level Planning and Control with Incomplete Information Using POMDP's , 1998 .
[24] Andreas Stafylopatis,et al. Collision-Free Movement of an Autonomous Vehicle Using Reinforcement Learning , 1992, ECAI.
[25] Larry D. Pyeatt,et al. Integrating POMDP and reinforcement learning for a two layer simulated robot architecture , 1999, AGENTS '99.
[26] R. A. Brooks,et al. Intelligence without Representation , 1991, Artif. Intell..
[27] Leslie Pack Kaelbling,et al. Planning With Deadlines in Stochastic Domains , 1993, AAAI.
[28] N. Zhang,et al. Algorithms for partially observable markov decision processes , 2001 .
[29] Judy Goldsmith,et al. Complexity issues in Markov decision processes , 1998, Proceedings. Thirteenth Annual IEEE Conference on Computational Complexity (Formerly: Structure in Complexity Theory Conference) (Cat. No.98CB36247).
[30] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[31] James L. Crowley,et al. Learning locomotion reflexes: A self-supervised neural system for a mobile robot , 1994, Robotics Auton. Syst..
[32] David L. Poole,et al. A Framework for Decision-Theoretic Planning I: Combining the Situation Calculus, Conditional Plans, Probability and Utility , 1996, UAI.
[33] F. Keil. Concepts, Kinds, and Cognitive Development , 1989 .
[34] Wolfram Burgard,et al. Coastal Navigation { Robot Motion with Uncertainty , 1998, AAAI 1998.
[35] Richard S. Sutton,et al. Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.
[36] Lee Spector,et al. Evolving teamwork and coordination with genetic programming , 1996 .
[37] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .
[38] Manuela M. Veloso,et al. Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.
[39] R. Simmons,et al. Probabilistic Navigation in Partially Observable Environments , 1995 .
[40] Philippe Lalanda,et al. A Domain-Specific Software Architecture for Adaptive Intelligent Systems , 1995, IEEE Trans. Software Eng..
[41] M. Littman,et al. Efficient dynamic-programming updates in partially observable Markov decision processes , 1995 .
[42] Reid G. Simmons,et al. Risk-Sensitive Planning with Probabilistic Decision Graphs , 1994, KR.
[43] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.
[44] Alessandro Saffiotti. Some Notes on the Integration of Planning and Reactivity in Autonomous Mobile Robots , 1993 .
[45] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[46] Maja J. Matarić,et al. Behavior-Based Systems: Key Properties and Implications , 1992 .
[47] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[48] Sher ry Folsom-Meek,et al. Human Performance , 2020, Nature.
[49] Michael P. Georgeff,et al. Decision-Making in an Embedded Reasoning System , 1989, IJCAI.
[50] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[51] J. Bruner. Organization of early skilled action. , 1973, Child development.
[52] Mark B. Ring. Learning Sequential Tasks by Incrementally Adding Higher Orders , 1992, NIPS.
[53] Donald C. Wunsch,et al. Convergence of critic-based training , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.
[54] Claude Sammut,et al. Automatic construction of reactive control systems using symbolic machine learning , 1996, The Knowledge Engineering Review.
[55] Marcelo H. Ang,et al. Performance of a neuro-model-based robot controller: adaptability and noise rejection , 1992 .
[56] T. Smithers,et al. A behavioural approach to robot task planning and off-line programming , 1987 .
[57] R. Passingham. The hippocampus as a cognitive map J. O'Keefe & L. Nadel, Oxford University Press, Oxford (1978). 570 pp., £25.00 , 1979, Neuroscience.
[58] Stephen S. Lee,et al. Planning with Partially Observable Markov Decision Processes: Advances in Exact Solution Method , 1998, UAI.
[59] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .
[60] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[61] Frédéric Gruau,et al. Cellular Encoding for interactive evolutionary robotics , 1996 .
[62] Charles W. Anderson,et al. Q-Learning with Hidden-Unit Restarting , 1992, NIPS.
[63] Michael P. Wellman,et al. Planning and Control , 1991 .
[64] T. Iberall,et al. Neural network architecture for robot hand control , 1989, IEEE Control Systems Magazine.
[65] Chelsea C. White,et al. A survey of solution techniques for the partially observed Markov decision process , 1991, Ann. Oper. Res..
[66] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[67] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.
[68] Stephen Grossberg,et al. A self-organizing neural network model for redundant sensory-motor control, motor equivalence, and tool use , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.
[69] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[70] Charles W. Anderson,et al. Reinforcement Learning with Modular Neural Networks for Control , 1994 .
[71] Nils J. Nilsson,et al. Shakey the Robot , 1984 .
[72] R. A. McCallum. First Results with Instance-Based State Identification for Reinforcement Learning , 1994 .
[73] Ronald C. Arkin,et al. An Behavior-based Robotics , 1998 .
[74] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[75] Simon Kasif,et al. A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..
[76] Robert James Firby,et al. Adaptive execution in complex dynamic worlds , 1989 .
[77] David E. Wilkins,et al. Domain-Independent Planning: Representation and Plan Generation , 1984, Artif. Intell..
[78] Ron Sun,et al. Robust Reasoning: Integrating Rule-Based and Similarity-Based Reasoning , 1995, Artif. Intell..
[79] Reid G. Simmons,et al. Robot Navigation with Markov Models: A Framework for Path Planning and Learning with Limited Computational Resources , 1995, Reasoning with Uncertainty in Robotics.
[80] Prasad Tadepalli,et al. Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..
[81] J. O'Keefe,et al. The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving rat. , 1971, Brain research.
[82] Nicholas Kushmerick,et al. An Algorithm for Probabilistic Planning , 1995, Artif. Intell..
[83] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[84] Francois Felix Ingrand,et al. Monitoring and control of spacecraft systems using procedural reasoning , 1990 .
[85] Steven L. Salzberg,et al. On growing better decision trees from data , 1996 .
[86] Charles W. Anderson,et al. Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).
[87] Jean-Arcady Meyer,et al. Place Sequence Learning for Navigation , 1997, ICANN.
[88] John D. Lowrance,et al. Planning and reacting in uncertain and dynamic environments , 1995, J. Exp. Theor. Artif. Intell..
[89] Hector Geffner,et al. Solving Large POMDPs using Real Time Dynamic Programming , 1998 .
[90] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.
[91] Larry D. Pyeatt,et al. REINFORCEMENT LEARNING FOR COORDINATED REACTIVE CONTROL , 1998 .
[92] Jonathan H. Connell,et al. A colony architecture for an artificial creature , 1989 .
[93] Jorg-Michael Hasemann,et al. Robot control architectures: application requirements, approaches, and technologies , 1995, Other Conferences.
[94] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[95] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .
[96] Cynthia Ferrell,et al. Failure Recognition and Fault Tolerance of an Autonomous Robot , 1994, Adapt. Behav..
[97] Arthur L. Samuel,et al. Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..
[98] Maja J. Mataric,et al. Integration of representation into goal-driven behavior-based robots , 1992, IEEE Trans. Robotics Autom..
[99] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .
[100] Marco Colombetti,et al. Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..
[101] Gerald Tesauro,et al. Neurogammon: a neural-network backgammon program , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[102] Milos Hauskrecht,et al. Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes , 1997, AAAI/IAAI.
[103] A. Barto,et al. Learning and Sequential Decision Making , 1989 .
[104] Satinder P. Singh,et al. Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models , 1992, ML.
[105] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[106] Sean Luke,et al. Genetic Programming Produced Competitive Soccer Softbot Teams for RoboCup97 , 1998 .
[107] L. Nadel,et al. The Hippocampus as a Cognitive Map , 1978 .
[108] G. Monahan. State of the Art—A Survey of Partially Observable Markov Decision Processes: Theory, Models, and Algorithms , 1982 .
[109] RU Muller,et al. The hippocampus as a cognitive graph , 1996, The Journal of general physiology.
[110] Rodney A. Brooks,et al. Integrated systems based on behaviors , 1991, SGAR.
[111] Roderic A. Grupen,et al. Robust Reinforcement Learning in Motion Planning , 1993, NIPS.
[112] Naohiro Fukumura,et al. Learning goal-directed sensory-based navigation of a mobile robot , 1994, Neural Networks.
[113] J. Millán,et al. A Reinforcement Connectionist Approach to Robot Path Finding in Non-Maze-Like Environments , 2004, Machine Learning.
[114] Kurt W. Fischer,et al. Human Development: From Conception Through Adolescence , 1984 .
[115] M. Littman. The Witness Algorithm: Solving Partially Observable Markov Decision Processes , 1994 .
[116] Karen Zita Haigh,et al. A layered architecture for office delivery robots , 1997, AGENTS '97.
[117] Reid G. Simmons,et al. Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.
[118] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[119] P. Grobstein. From Animals to Animats 2: Proceedings of the Second International Conference on Simulation of Adaptive Behavior , 1994 .
[120] Ulrich Nehmzow,et al. Autonomous Acquisition of Sensor-Motor Couplings in Robots , 1994 .
[121] J. Lammens,et al. Behavior Based Ai, Cognitive Processes, and Emergent Behaviors in Autonomous Agents , 1993 .
[122] Leslie Pack Kaelbling,et al. Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.
[123] Satinder P. Singh,et al. The Efficient Learning of Multiple Task Sequences , 1991, NIPS.
[124] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[125] G. B. Andeen,et al. Structured neural-network approach to robot motion control , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.
[126] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[127] S.J.J. Smith,et al. Empirical Methods for Artificial Intelligence , 1995 .
[128] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.