Top-$k$ Combinatorial Bandits with Full-Bandit Feedback
暂无分享,去创建一个
[1] John N. Tsitsiklis,et al. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem , 2004, J. Mach. Learn. Res..
[2] Xi Chen,et al. Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing , 2014, ICML.
[3] Richard Combes,et al. Stochastic Online Shortest Path Routing: The Value of Feedback , 2013, IEEE Transactions on Automatic Control.
[4] H. S. Shapiro,et al. A Combinatory Detection Problem , 1963 .
[5] Ambuj Tewari,et al. PAC Subset Selection in Stochastic Multi-armed Bandits , 2012, ICML.
[6] Alexandre Proutière,et al. Combinatorial Bandits Revisited , 2015, NIPS.
[7] Masashi Sugiyama,et al. Polynomial-time Algorithms for Combinatorial Pure Exploration with Full-bandit Feedback , 2019, ArXiv.
[8] Shivaram Kalyanakrishnan,et al. Information Complexity in Bandit Subset Selection , 2013, COLT.
[9] Sébastien Bubeck,et al. Multiple Identifications in Multi-Armed Bandits , 2012, ICML.
[10] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[11] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[12] Shuai Li,et al. TopRank: A practical algorithm for online stochastic ranking , 2018, NeurIPS.
[13] Tamir Hazan,et al. Tight Bounds for Bandit Combinatorial Optimization , 2017, COLT.
[14] Vaneet Aggarwal,et al. Regret Bounds for Stochastic Combinatorial Multi-Armed Bandits with Linear Space Complexity , 2018, ArXiv.
[15] Wei Chen,et al. Combinatorial Multi-Armed Bandit with General Reward Functions , 2016, NIPS.
[16] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.
[17] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[18] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[19] Rémi Munos,et al. Pure Exploration in Multi-armed Bandits Problems , 2009, ALT.
[20] S. Agaian. Hadamard Matrices and Their Applications , 1985 .
[21] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[22] Nader H. Bshouty. On the Coin Weighing Problem with the Presence of Noise , 2012, APPROX-RANDOM.
[23] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[24] Peter Stone,et al. Efficient Selection of Multiple Bandit Arms: Theory and Practice , 2010, ICML.
[25] Wei Chen,et al. Combinatorial Pure Exploration of Multi-Armed Bandits , 2014, NIPS.
[26] Wei Chen,et al. Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.
[27] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[28] Gábor Lugosi,et al. Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..
[29] Aleksandrs Slivkins,et al. Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..
[30] Wtt Wtt. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .
[31] Shie Mannor,et al. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems , 2006, J. Mach. Learn. Res..
[32] Wei Chen,et al. Combinatorial Partial Monitoring Game with Linear Feedback and Its Applications , 2014, ICML.
[33] Shie Mannor,et al. Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem , 2019, COLT.
[34] Paul Erdgs,et al. ON TWO PROBLEMS OF INFORMATION THEORY bY PAUL ERDGS and ALFRJ~D RgNYI , 2001 .
[35] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.