Thompson Sampling for Factored Multi-Agent Bandits
暂无分享,去创建一个
Timothy Verstraeten | Diederik M. Roijers | Eugenio Bargiacchi | Pieter JK Libin | Diederik M Roijers | Ann Now'e | Pieter J. K. Libin | P. Libin | T. Verstraeten | Ann Now'e | Eugenio Bargiacchi | D. Roijers
[1] Carlos Guestrin,et al. Multiagent Planning with Factored MDPs , 2001, NIPS.
[2] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.
[3] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[4] Jason R. Marden,et al. A Model-Free Approach to Wind Farm Control Using Game Theoretic Methods , 2013, IEEE Transactions on Control Systems Technology.
[5] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .
[6] Wei Chen,et al. Combinatorial multi-armed bandit: general framework, results and applications , 2013, ICML 2013.
[7] Peter Vrancx,et al. Learning multi-agent state space representations , 2010, AAMAS.
[8] Craig Boutilier,et al. Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.
[9] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.
[10] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[11] Nikos A. Vlassis,et al. Using the Max-Plus Algorithm for Multiagent Decision Making in Coordination Graphs , 2005, BNAIC.
[12] Mathijs de Weerdt,et al. Solving Transition-Independent Multi-Agent MDPs with Sparse Interactions , 2015, AAAI.
[13] Yi Guo,et al. Fleetwide data-enabled reliability improvement of wind turbines , 2019, Renewable and Sustainable Energy Reviews.
[14] Shobha Venkataraman,et al. Context-specific multiagent coordination and planning with factored MDPs , 2002, AAAI/IAAI.
[15] Marco Wiering,et al. Multi-Agent Reinforcement Learning for Traffic Light control , 2000 .
[16] Yaoyu Li,et al. Yaw-Misalignment and its Impact on Wind Turbine Loads and Wind Farm Power Output , 2016 .
[17] David J. Lunn,et al. The BUGS Book: A Practical Introduction to Bayesian Analysis , 2013 .
[18] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[19] Ann Nowé,et al. Thompson Sampling for m-top Exploration , 2019, BNAIC/BENELEARN.
[20] Ann Nowé,et al. Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies , 2017, ECML/PKDD.
[21] Frans A. Oliehoek,et al. Decentralised Online Planning for Multi-Robot Warehouse Commissioning , 2017, AAMAS.
[22] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[23] Gábor Lugosi,et al. Minimax Policies for Combinatorial Prediction Games , 2011, COLT.
[24] Nikos A. Vlassis,et al. Sparse cooperative Q-learning , 2004, ICML.
[25] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[26] Akimichi Takemura,et al. Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors , 2013, AISTATS.
[27] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[28] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[29] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.
[30] Ann Nowé,et al. Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems , 2018, ICML.