Concurrent Hierarchical Reinforcement Learning
暂无分享,去创建一个
[1] L. J. Savage,et al. The Foundations of Statistics , 1955 .
[2] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.
[3] Austin Tate,et al. Generating Project Networks , 1977, IJCAI.
[4] P. Varaiya,et al. Multilayer control of large Markov chains , 1978 .
[5] J. Lumley. AUSTRALIA , 1920, The Lancet.
[6] John McCarthy,et al. SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .
[7] Guy L. Steele,et al. Common Lisp the Language , 1984 .
[8] M. F.,et al. Bibliography , 1985, Experimental Gerontology.
[9] Editors , 1986, Brain Research Bulletin.
[10] Tod S. Levitt,et al. Uncertainty in artificial intelligence , 1988 .
[11] Richard P. Lippmann,et al. Proceedings of the 1997 conference on Advances in neural information processing systems 10 , 1990 .
[12] R. Durrett. Probability: Theory and Examples , 1993 .
[13] Glynn Winskel,et al. The formal semantics of programming languages - an introduction , 1993, Foundation of computing series.
[14] Robert L. Grossman,et al. Timed Automata , 1999, CAV.
[15] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.
[16] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[17] Ralph Johnson,et al. design patterns elements of reusable object oriented software , 2019 .
[18] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.
[19] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[20] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.
[21] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[22] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.
[23] David Andre,et al. Generalized Prioritized Sweeping , 1997, NIPS.
[24] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.
[25] Ronen I. Brafman,et al. Planning with Concurrent Interacting Actions , 1997, AAAI/IAAI.
[26] Doina Precup,et al. Multi-time Models for Temporally Abstract Planning , 1997, NIPS.
[27] Milos Hauskrecht,et al. Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.
[28] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[29] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .
[30] Craig Boutilier,et al. The Dynamics of Reinforcement Learning in Cooperative Multiagent Systems , 1998, AAAI/IAAI.
[31] Sridhar Mahadevan,et al. Optimizing Production Manufacturing Using Reinforcement Learning , 1998, FLAIRS.
[32] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.
[33] Rina Dechter,et al. Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..
[34] Hiroaki Kitano,et al. RoboCup Rescue: search and rescue in large-scale disasters as a domain for autonomous agents research , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).
[35] Craig Boutilier,et al. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..
[36] Andrew Y. Ng,et al. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.
[37] Craig Boutilier,et al. Decision-Theoretic, High-Level Agent Programming in the Situation Calculus , 2000, AAAI/IAAI.
[38] Hector J. Levesque,et al. ConGolog, a concurrent programming language based on the situation calculus , 2000, Artif. Intell..
[39] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..
[40] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[41] Sridhar Mahadevan,et al. Hierarchical Memory-Based Reinforcement Learning , 2000, NIPS.
[42] David Andre,et al. Programmable Reinforcement Learning Agents , 2000, NIPS.
[43] Michail G. Lagoudakis,et al. Model-Free Least-Squares Policy Iteration , 2001, NIPS.
[44] Mark S. Fox,et al. Agent-Oriented Supply-Chain Management , 2000 .
[45] David Andre,et al. State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.
[46] Sridhar Mahadevan,et al. Hierarchically Optimal Average Reward Reinforcement Learning , 2002, ICML.
[47] Alex M. Andrew,et al. Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems , 2002 .
[48] Michael I. Jordan,et al. A Minimal Intervention Principle for Coordinated Movement , 2002, NIPS.
[49] Sridhar Mahadevan,et al. Learning to Take Concurrent Actions , 2002, NIPS.
[50] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[51] Carlos Guestrin,et al. Generalizing plans to new environments in relational MDPs , 2003, IJCAI 2003.
[52] A. Barto,et al. LEARNING AND APPROXIMATE DYNAMIC PROGRAMMING Scaling Up to the Real World , 2003 .
[53] Sridhar Mahadevan,et al. Hierarchical Policy Gradient Algorithms , 2003, ICML.
[54] Eric Wiewiora,et al. Potential-Based Shaping and Q-Value Initialization are Equivalent , 2003, J. Artif. Intell. Res..
[55] Stuart J. Russell,et al. Q-Decomposition for Reinforcement Learning Agents , 2003, ICML.
[56] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[57] Michael R. James,et al. Predictive State Representations: A New Theory for Modeling Dynamical Systems , 2004, UAI.
[58] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.
[59] Allen Newell,et al. Chunking in Soar: The anatomy of a general learning mechanism , 1985, Machine Learning.
[60] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[61] Jennie Si,et al. Hierarchical Approaches to Concurrency, Multiagency, and Partial Observability , 2004 .
[62] Andrew W. Moore,et al. Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.
[63] Emanuel Todorov,et al. From task parameters to motor synergies: A hierarchical framework for approximately optimal control of redundant manipulators , 2005 .
[64] Hector Muñoz-Avila,et al. Applications of SHOP and SHOP2 , 2005, IEEE Intelligent Systems.
[65] Stuart J. Russell,et al. A compact, hierarchically optimal Q-function decomposition , 2006, UAI 2006.
[66] Richard Fikes,et al. Design and Implementation of the CALO Query Manager , 2006, AAAI.
[67] Sridhar Mahadevan,et al. Hierarchical multi-agent reinforcement learning , 2001, AGENTS '01.
[68] Fernando Paganini,et al. IEEE Transactions on Automatic Control , 2006 .
[69] John A. Buzacott. International Journal of Flexible Manufacturing Systems: an appreciation , 2007 .