Online Learning with Diverse User Preferences
暂无分享,去创建一个
Cong Shen | Jing Yang | Ruida Zhou | Chao Gan
[1] Shipra Agrawal,et al. Further Optimal Regret Bounds for Thompson Sampling , 2012, AISTATS.
[2] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[3] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[4] John Langford,et al. Practical Evaluation and Optimization of Contextual Bandit Algorithms , 2018, ArXiv.
[5] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[6] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[7] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[8] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[9] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[10] Sampath Kannan,et al. A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem , 2018, NeurIPS.
[11] Bhaskar Krishnamachari,et al. Stochastic Contextual Bandits with Known Reward Functions , 2016, ArXiv.
[12] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[13] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[14] Matthew J. Streeter,et al. Tighter Bounds for Multi-Armed Bandits with Expert Advice , 2009, COLT.
[15] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[16] Nello Cristianini,et al. Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.
[17] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[18] Khashayar Khosravi,et al. Exploiting the Natural Exploration In Contextual Bandits , 2017, ArXiv.
[19] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[20] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..