暂无分享,去创建一个
[1] Peter Auer,et al. An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits , 2016, COLT.
[2] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[3] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[4] Thorsten Joachims,et al. Reducing Dueling Bandits to Cardinal Bandits , 2014, ICML.
[5] Éva Tardos,et al. Learning in Games: Robustness of Fast Convergence , 2016, NIPS.
[6] Koby Crammer,et al. A generalized online mirror descent with applications to classification and regression , 2013, Machine Learning.
[7] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..
[8] Haipeng Luo,et al. Corralling a Band of Bandit Algorithms , 2016, COLT.
[9] C. Tsallis. Possible generalization of Boltzmann-Gibbs statistics , 1988 .
[10] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[11] Aleksandrs Slivkins,et al. 25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .
[12] 丸山 徹. Convex Analysisの二,三の進展について , 1977 .
[13] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[14] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[15] Sébastien Bubeck. Bandits Games and Clustering Foundations , 2010 .
[16] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[17] Renato Paes Leme,et al. Stochastic bandits robust to adversarial corruptions , 2018, STOC.
[18] Anupam Gupta,et al. Better Algorithms for Stochastic Bandits with Adversarial Corruptions , 2019, COLT.
[19] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[20] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[21] Lilian Besson,et al. What Doubling Tricks Can and Can't Do for Multi-Armed Bandits , 2018, ArXiv.
[22] Ambuj Tewari,et al. Online Linear Optimization via Smoothing , 2014, COLT.
[23] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[24] Julian Zimmert,et al. Connections Between Mirror Descent, Thompson Sampling and the Information Ratio , 2019, NeurIPS.
[25] Ambuj Tewari,et al. Fighting Bandits with a New Kind of Smoothness , 2015, NIPS.
[26] Rémi Munos,et al. Thompson Sampling: An Optimal Finite Time Analysis , 2012, ArXiv.
[27] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[28] Gábor Lugosi,et al. An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits , 2017, COLT.
[29] Aleksandrs Slivkins,et al. One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.
[30] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.
[31] Peter L. Bartlett,et al. Best of both worlds: Stochastic & adversarial best-arm identification , 2018, COLT.
[32] Julian Zimmert,et al. Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously , 2019, ICML.
[33] Haipeng Luo,et al. More Adaptive Algorithms for Adversarial Bandits , 2018, COLT.