Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits
暂无分享,去创建一个
Sanmay Das | Brendan Juba | Mithun Chakraborty | Kai Yee Phoebe Chua | Sanmay Das | Brendan A. Juba | Mithun Chakraborty | Brendan Juba
[1] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .
[2] Naumaan Nayyar,et al. Multi-player multi-armed bandits: Decentralized learning with IID rewards , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[3] H. Roche,et al. Why Copy Others? Insights from the Social Learning Strategies Tournament , 2010 .
[4] Mehryar Mohri,et al. Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.
[5] Boaz Barak. Computational Models , 2011, Encyclopedia of Parallel Computing.
[6] Eshcar Hillel,et al. Distributed Exploration in Multi-Armed Bandits , 2013, NIPS.
[7] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.
[8] N. Jojic,et al. Ieee Transactions on Signal Processing: Supplement on Secure Media 1 Facecerts Ieee Transactions on Signal Processing: Supplement on Secure Media 2 , 2003 .
[9] K. Schlag. Why Imitate, and If So, How?, : A Boundedly Rational Approach to Multi-armed Bandits , 1998 .
[10] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[11] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.
[12] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[13] Grey Giddins,et al. Statistics , 2016, The Journal of hand surgery, European volume.
[14] Sarit Kraus,et al. Collaborative Plans for Complex Group Action , 1996, Artif. Intell..
[15] Li Zhang,et al. Information sharing in distributed stochastic bandits , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).
[16] Peter Stone,et al. Cooperating with Unknown Teammates in Complex Domains: A Robot Soccer Case Study of Ad Hoc Teamwork , 2015, AAAI.
[17] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[18] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.
[19] Sarit Kraus,et al. To teach or not to teach?: decision making under uncertainty in ad hoc teams , 2010, AAMAS.
[20] Subramanian Ramamoorthy,et al. A game-theoretic model and best-response learning method for ad hoc coordination in multiagent systems , 2013, AAMAS.
[21] Christos Dimitrakakis,et al. Differentially private, multi-agent multi-armed bandits , 2015, EWRL 2015.
[22] Manfred K. Warmuth,et al. THE WEIGHTED MAJORITY ALGORITHM (Supersedes 89-16) , 1992 .
[23] István Hegedüs,et al. Gossip-based distributed stochastic bandit algorithms , 2013, ICML.
[24] Sarit Kraus,et al. Ad Hoc Autonomous Agent Teams: Collaboration without Pre-Coordination , 2010, AAAI.
[25] Carlos Gershenson,et al. Information and Computation , 2013, Handbook of Human Computation.
[26] Craig Boutilier,et al. A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.
[27] Grégory Bonnet,et al. Multi-Armed Bandit Policies for Reputation Systems , 2014, PAAMS.
[28] Qing Zhao,et al. Distributed Learning in Multi-Armed Bandit With Multiple Players , 2009, IEEE Transactions on Signal Processing.
[29] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[30] T. Lillicrap,et al. Why Copy Others? Insights from the Social Learning Strategies Tournament , 2010, Science.
[31] Dean Phillips Foster. Prediction in the Worst Case , 1991 .
[32] Matthew E. Taylor,et al. Identifying and Tracking Switching, Non-Stationary Opponents: A Bayesian Approach , 2016, AAAI Workshop: Multiagent Interaction without Prior Coordination.
[33] Ronald A. Howard,et al. Information Value Theory , 1966, IEEE Trans. Syst. Sci. Cybern..
[34] Daphne Koller,et al. Making Rational Decisions Using Adaptive Utility Elicitation , 2000, AAAI/IAAI.
[35] Milind Tambe,et al. Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..
[36] Sarit Kraus,et al. Communicating with Unknown Teammates , 2014, ECAI.
[37] Victor R. Lesser,et al. Designing a Family of Coordination Algorithms , 1997, ICMAS.
[38] K. Pearson,et al. Biometrika , 1902, The American Naturalist.
[39] Y. Freund,et al. Adaptive game playing using multiplicative weights , 1999 .
[40] 中山 幹夫,et al. Games and Economic Behavior of Bounded Rationality , 2016 .
[41] Richard Gonzalez,et al. Computational Models for the Combination of Advice and Individual Learning , 2009, Cogn. Sci..
[42] L. Goddard,et al. Operations Research (OR) , 2007 .
[43] J. Gittins,et al. A dynamic allocation index for the discounted multiarmed bandit problem , 1979 .