Planning, Learning and Coordination in Multiagent Decision Processes
暂无分享,去创建一个
[1] R. Bellman,et al. Dynamic Programming and Markov Processes , 1960 .
[2] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .
[3] David Lewis. Convention: A Philosophical Study , 1986 .
[4] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[5] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .
[6] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
[7] W. Hamilton,et al. The evolution of cooperation. , 1984, Science.
[8] David M. Kreps,et al. Sequential Equilibria1 , 1982 .
[9] Paul J. Schweitzer,et al. Iterative Aggregation-Disaggregation Procedures for Discounted Semi-Markov Reward Processes , 1985, Oper. Res..
[10] John C. Harsanyi,et al. Общая теория выбора равновесия в играх / A General Theory of Equilibrium Selection in Games , 1989 .
[11] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .
[12] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[13] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.
[14] A. Moore. Variable Resolution Dynamic Programming , 1991, ML.
[15] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.
[16] Roger B. Myerson,et al. Game theory - Analysis of Conflict , 1991 .
[17] Michael P. Wellman,et al. Planning and Control , 1991 .
[18] Geoffrey E. Hinton,et al. Feudal Reinforcement Learning , 1992, NIPS.
[19] P. Dayan. The Convergence of TD(λ) for General λ , 1992, Machine Learning.
[20] Moshe Tennenholtz,et al. On the Synthesis of Useful Social Laws for Artificial Agent Societies (Preliminary Report) , 1992, AAAI.
[21] Moshe Tennenholtz,et al. Emergent Conventions in Multi-Agent Systems: Initial Experimental Results and Observations (Preliminary Report) , 1992, KR.
[22] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[23] H. Young,et al. The Evolution of Conventions , 1993 .
[24] Leslie Pack Kaelbling,et al. Planning With Deadlines in Stochastic Domains , 1993, AAAI.
[25] R. Rob,et al. Learning, Mutation, and Long Run Equilibria in Games , 1993 .
[26] Holly A. Yanco,et al. An adaptive communication protocol for cooperating mobile robots , 1993 .
[27] D. Fudenberg,et al. Steady state learning and Nash equilibrium , 1993 .
[28] Gerhard Weiss,et al. Learning to Coordinate Actions in Multi-Agent-Systems , 1993, IJCAI.
[29] E. Kalai,et al. Rational Learning Leads to Nash Equilibrium , 1993 .
[30] Craig Boutilier,et al. Using Abstractions for Decision-Theoretic Planning with Time Constraints , 1994, AAAI.
[31] Stuart J. Russell,et al. Control Strategies for a Stochastic Planner , 1994, AAAI.
[32] Andrew W. Moore,et al. Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.
[33] Eithan Ephrati,et al. Divide and Conquer in Multi-Agent Planning , 1994, AAAI.
[34] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[35] Sridhar Mahadevan,et al. To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.
[36] Maja J. Mataric,et al. Reward Functions for Accelerated Learning , 1994, ICML.
[37] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.
[38] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[39] Sandip Sen,et al. Learning to Coordinate without Sharing Information , 1994, AAAI.
[40] Craig Boutilier,et al. Integrating Planning and Execution in Stochastic Domains , 1994, UAI.
[41] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[42] Thomas G. Dietterich,et al. Explanation-Based Learning and Reinforcement Learning: A Unified View , 1995, Machine-mediated learning.
[43] Craig Boutilier,et al. Process-Oriented Planning and Average-Reward Optimality , 1995, IJCAI.
[44] Thomas Dean,et al. Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.
[45] Reid G. Simmons,et al. Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.
[46] Sandip Sen,et al. Multiagent Coordination with Learning Classifier Systems , 1995, Adaption and Learning in Multi-Agent Systems.
[47] Craig Boutilier,et al. Exploiting Structure in Policy Construction , 1995, IJCAI.
[48] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..
[49] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[50] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.
[51] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[52] Craig Boutilier,et al. Approximate Value Trees in Structured Dynamic Programming , 1996, ICML.
[53] T. Dean,et al. Planning under uncertainty: structural assumptions and computational leverage , 1996 .