暂无分享,去创建一个
[1] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[2] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[3] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..
[4] Makoto Yokoo,et al. Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.
[5] P. J. Gmytrasiewicz,et al. A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.
[6] Nataliya Sokolovska,et al. Continuous Upper Confidence Trees , 2011, LION.
[7] Shimon Whiteson,et al. Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.
[8] Pedro U. Lima,et al. Efficient Offline Communication Policies for Factored Multiagent POMDPs , 2011, NIPS.
[9] Gerald Tesauro,et al. Temporal difference learning and TD-Gammon , 1995, CACM.
[10] Claudia V. Goldman,et al. Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..
[11] Joshua B. Tenenbaum,et al. Nonparametric Bayesian Policy Priors for Reinforcement Learning , 2010, NIPS.
[12] Michael Fairbank,et al. The divergence of reinforcement learning algorithms with value-iteration and function approximation , 2011, The 2012 International Joint Conference on Neural Networks (IJCNN).
[13] Shie Mannor,et al. Reinforcement learning with Gaussian processes , 2005, ICML.
[14] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.
[15] Lawrence Carin,et al. Learning to Explore and Exploit in POMDPs , 2009, NIPS.
[16] Frans A. Oliehoek,et al. Best Response Bayesian Reinforcement Learning for Multiagent Systems with State Uncertainty , 2014 .
[17] Pascal Poupart,et al. Model-based Bayesian Reinforcement Learning in Partially Observable Domains , 2008, ISAIM.
[18] Jesse Hoey,et al. An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.
[19] Alborz Geramifard,et al. Decentralized control of Partially Observable Markov Decision Processes using belief space macro-actions , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[20] Richard S. Sutton,et al. Temporal-difference search in computer Go , 2012, Machine Learning.
[21] Kee-Eung Kim,et al. Learning to Cooperate via Policy Search , 2000, UAI.
[22] Guy Shani,et al. Model-Based Online Learning of POMDPs , 2005, ECML.
[23] Andrew G. Barto,et al. Optimal learning: computational procedures for bayes-adaptive markov decision processes , 2002 .
[24] Nikos A. Vlassis,et al. Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..
[25] Peter Vrancx,et al. Reinforcement Learning: State-of-the-Art , 2012 .
[26] Jaakko Peltonen,et al. Efficient Planning for Factored Infinite-Horizon DEC-POMDPs , 2011, IJCAI.
[27] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[28] Shimon Whiteson,et al. Exploiting Structure in Cooperative Bayesian Games , 2012, UAI.
[29] Victor R. Lesser,et al. Self-organization for coordinating decentralized reinforcement learning , 2010, AAMAS.
[30] Andrew Wang,et al. Bayes-Adaptive Interactive POMDPs , 2012, AAAI.
[31] Nicholas R. Jennings,et al. Decentralized Bayesian reinforcement learning for online agent collaboration , 2012, AAMAS.
[32] Umar Syed,et al. Graphical Models for Bandit Problems , 2011, UAI.
[33] Nikos A. Vlassis,et al. Collaborative Multiagent Reinforcement Learning by Payoff Propagation , 2006, J. Mach. Learn. Res..
[34] Shie Mannor,et al. Bayesian Reinforcement Learning , 2012, Reinforcement Learning.
[35] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.
[36] Rémi Coulom,et al. Computing "Elo Ratings" of Move Patterns in the Game of Go , 2007, J. Int. Comput. Games Assoc..
[37] Olivier Buffet,et al. Exploiting separability in multiagent planning with continuous-state MDPs , 2014, AAMAS.
[39] Marc Toussaint,et al. Model-free reinforcement learning as mixture learning , 2009, ICML '09.
[40] Michail G. Lagoudakis,et al. Coordinated Reinforcement Learning , 2002, ICML.
[41] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[42] Joelle Pineau,et al. Bayes-Adaptive POMDPs , 2007, NIPS.
[43] Joelle Pineau,et al. A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..
[44] Shlomo Zilberstein,et al. Constraint-based dynamic programming for decentralized POMDPs with structured interactions , 2009, AAMAS.
[45] Joelle Pineau,et al. Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..
[46] Leslie Pack Kaelbling,et al. All learning is Local: Multi-agent Learning in Global Reward Games , 2003, NIPS.
[47] Peter Dayan,et al. Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search , 2012, NIPS.
[48] Victor R. Lesser,et al. Multiagent reinforcement learning and self-organization in a network of agents , 2007, AAMAS '07.
[49] Craig Boutilier,et al. Coordination in multiagent reinforcement learning: a Bayesian approach , 2003, AAMAS '03.
[50] Nicholas R. Jennings,et al. Decentralised coordination of low-power embedded devices using the max-sum algorithm , 2008, AAMAS.
[51] Milind Tambe,et al. The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..
[52] Shimon Whiteson,et al. Approximate solutions for factored Dec-POMDPs with many agents , 2013, AAMAS.
[53] Frans A. Oliehoek,et al. Decentralized POMDPs , 2012, Reinforcement Learning.
[54] Joel Veness,et al. Monte-Carlo Planning in Large POMDPs , 2010, NIPS.