暂无分享,去创建一个
[1] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[2] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[3] Aleksandrs Slivkins,et al. Contextual Bandits with Similarity Information , 2009, COLT.
[4] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[5] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[6] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[7] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.
[8] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[9] Yi Gai,et al. Distributed Stochastic Online Learning Policies for Opportunistic Spectrum Access , 2014, IEEE Transactions on Signal Processing.
[10] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[11] Sattar Vakili,et al. Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems , 2011, IEEE Journal of Selected Topics in Signal Processing.
[12] R. Srikant,et al. Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits , 2015, NIPS.
[13] John Langford,et al. Resourceful Contextual Bandits , 2014, COLT.
[14] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[15] Martin Pál,et al. Contextual Multi-Armed Bandits , 2010, AISTATS.
[16] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[17] Apostolos Burnetas,et al. Optimal Adaptive Policies for Markov Decision Processes , 1997, Math. Oper. Res..
[18] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[19] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.