Hierarchical reinforcement learning in continuous state and multi-agent environments
暂无分享,去创建一个
[1] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[2] Andrew G. Barto,et al. Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.
[3] Shigenobu Kobayashi,et al. Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward , 1995, ICML.
[4] C. Atkeson,et al. Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .
[5] Manuela M. Veloso,et al. Multiagent learning using a variable learning rate , 2002, Artif. Intell..
[6] Andrew Y. Ng,et al. Shaping and policy search in reinforcement learning , 2003 .
[7] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.
[8] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[9] Prasad Tadepalli,et al. Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..
[10] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..
[11] Prasad Tadepalli,et al. Auto-Exploratory Average Reward Reinforcement Learning , 1996, AAAI/IAAI, Vol. 1.
[12] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[13] Model-based Hierarchical Average-reward Reinforcement Learning , 2002, ICML.
[14] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[15] S. Shankar Sastry,et al. Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.
[16] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.
[17] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[18] Luis E. Ortiz,et al. Nash Propagation for Loopy Graphical Games , 2002, NIPS.
[19] Bernhard Hengst,et al. Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.
[20] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.
[21] Michael L. Littman,et al. Graphical Models for Game Theory , 2001, UAI.
[22] Craig Boutilier,et al. Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.
[23] Gerald Tesauro,et al. TD-Gammon: A Self-Teaching Backgammon Program , 1995 .
[24] P. Varaiya,et al. Multilayer control of large Markov chains , 1978 .
[25] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[26] Suchi Saria,et al. Probabilistic Plan Recognition in Multiagent Systems , 2004, ICAPS.
[27] Sridhar Mahadevan,et al. Hierarchically Optimal Average Reward Reinforcement Learning , 2002, ICML.
[28] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[29] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[30] Michael I. Jordan,et al. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .
[31] Craig A. Knoblock. Learning Abstraction Hierarchies for Problem Solving , 1990, AAAI.
[32] C. SIAMJ.. A NEW VALUE ITERATION METHOD FOR THE AVERAGE COST DYNAMIC PROGRAMMING PROBLEM∗ , 1995 .
[33] Milind Tambe,et al. Distributed Sensor Networks: A Multiagent Perspective , 2003 .
[34] Andrew G. Barto,et al. PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.
[35] Xi-Ren Cao,et al. Perturbation analysis of discrete event dynamic systems , 1991 .
[36] Victor R. Lesser,et al. Learning to Improve Coordinated Actions in Cooperative Distributed Problem-Solving Environments , 1998, Machine Learning.
[37] Yishay Mansour,et al. Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.
[38] Shie Mannor,et al. Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.
[39] Benjamin Van Roy,et al. The linear programming approach to approximate dynamic programming: theory and application , 2002 .
[40] Rodney A. Brooks,et al. A Robust Layered Control Syste For A Mobile Robot , 2022 .
[41] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[42] Earl D. Sacerdoti,et al. Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.
[43] Long-Ji Lin,et al. Reinforcement learning for robots using neural networks , 1992 .
[44] Hassan K. Khalil,et al. Singular perturbation methods in control : analysis and design , 1986 .
[45] Geoffrey J. Gordon,et al. Approximate solutions to markov decision processes , 1999 .
[46] Benjamin Van Roy. Learning and value function approximation in complex decision processes , 1998 .
[47] Satinder Singh. Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.
[48] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[49] Richard E. Korf,et al. Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..
[50] C. Watkins. Learning from delayed rewards , 1989 .
[51] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.
[52] Victor R. Lesser,et al. Multi-agent policies: from centralized ones to decentralized ones , 2002, AAMAS '02.
[53] Leslie Pack Kaelbling,et al. Learning to Achieve Goals , 1993, IJCAI.
[54] Michael L. Littman,et al. Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.
[55] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[56] Herbert A. Simon,et al. The Sciences of the Artificial , 1970 .
[57] Julie A. Adams,et al. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence , 2001, AI Mag..
[58] Paul R. Cohen,et al. Searching for Planning Operators with Context-Dependent and Probabilistic Effects , 1996, AAAI/IAAI, Vol. 1.
[59] Michael P. Wellman,et al. Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.
[60] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[61] Dimitri P. Bertsekas,et al. Neuro-Dynamic Programming , 2009, Encyclopedia of Optimization.
[62] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[63] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[64] Richard S. Sutton,et al. Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.
[65] Allen Newell,et al. Chunking in Soar: The anatomy of a general learning mechanism , 1985, Machine Learning.
[66] Tucker R. Balch,et al. Behavior-based formation control for multirobot teams , 1998, IEEE Trans. Robotics Autom..
[67] Sridhar Mahadevan,et al. Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.
[68] Andrew G. Barto,et al. A causal approach to hierarchical decomposition of factored MDPs , 2005, ICML.
[69] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[70] Satinder Singh,et al. An Efficient Exact Algorithm for Singly Connected Graphical Games , 2002, NIPS 2002.
[71] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.
[72] Christos G. Cassandras,et al. Introduction to Discrete Event Systems , 1999, The Kluwer International Series on Discrete Event Dynamic Systems.
[73] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[74] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[75] Manuela M. Veloso,et al. Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.
[76] Thomas G. Dietterich. The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.
[77] Sridhar Mahadevan,et al. Hierarchical Policy Gradient Algorithms , 2003, ICML.
[78] Sridhar Mahadevan,et al. Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..
[79] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[80] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[81] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.
[82] Yishay Mansour,et al. Nash Convergence of Gradient Dynamics in General-Sum Games , 2000, UAI.
[83] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[84] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[85] Jun Morimoto,et al. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..
[86] Zhiyuan Ren,et al. A time aggregation approach to Markov decision processes , 2002, Autom..
[87] Vijay R. Konda,et al. Actor-Critic Algorithms , 1999, NIPS.
[88] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[89] Earl D. Sacerdott. Planning in a hierarchy of abstraction spaces , 1973, IJCAI 1973.
[90] Daphne Koller,et al. Multi-Agent Influence Diagrams for Representing and Solving Games , 2001, IJCAI.
[91] Wei Zhang,et al. A Reinforcement Learning Approach to job-shop Scheduling , 1995, IJCAI.
[92] David Andre,et al. Programmable Reinforcement Learning Agents , 2000, NIPS.
[93] Abhijit Gosavi,et al. Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning , 2007 .
[94] Andrew W. Moore,et al. Distributed Value Functions , 1999, ICML.
[95] Gary L. Drescher,et al. Made-up minds - a constructivist approach to artificial intelligence , 1991 .
[96] Austin Tate,et al. O-Plan: The open Planning Architecture , 1991, Artif. Intell..
[97] ModelsSridhar,et al. Designing Agent Controllers using Discrete-Event Markov , 2007 .
[98] R. Howard. Dynamic Programming and Markov Processes , 1960 .
[99] Sridhar Mahadevan,et al. Learning to Take Concurrent Actions , 2002, NIPS.
[100] J. Filar,et al. Competitive Markov Decision Processes , 1996 .
[101] Svetha Venkatesh,et al. Policy Recognition in the Abstract Hidden Markov Model , 2002, J. Artif. Intell. Res..
[102] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[103] Pierfrancesco La Mura. Game Networks , 2000, UAI.
[104] Eduardo D. Sontag,et al. Neural Networks for Control , 1993 .
[105] Ronen I. Brafman,et al. Modeling Agents as Qualitative Decision Makers , 1997, Artif. Intell..
[106] Gang Wang,et al. Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes , 1999, ICML.
[107] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.
[108] Shie Mannor,et al. Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.
[109] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[110] Daphne Koller,et al. Multi-agent algorithms for solving graphical games , 2002, AAAI/IAAI.
[111] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[112] Xin Wang,et al. Batch Value Function Approximation via Support Vectors , 2001, NIPS.
[113] Sebastian Thrun,et al. Finding Structure in Reinforcement Learning , 1994, NIPS.
[114] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[115] Arthur L. Samuel,et al. Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..
[116] Maja J. Mataric,et al. Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.
[117] Sridhar Mahadevan,et al. Learning to communicate and act using hierarchical reinforcement learning , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..
[118] Dimitri P. Bertsekas,et al. Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.
[119] Roderic A. Grupen,et al. A feedback control structure for on-line learning tasks , 1997, Robotics Auton. Syst..
[120] Milind Tambe,et al. The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..
[121] Andrew G. Barto,et al. Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.