Pairwise Regression with Upper Confidence Bound for Contextual Bandit with Multiple Actions
暂无分享,去创建一个
[1] Wei Chu,et al. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.
[2] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.
[3] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[4] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[5] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[6] Hsuan-Tien Lin,et al. Balancing between Estimated Reward and Uncertainty during News Article Recommendation for ICML 2012 Exploration and Exploitation Challenge , 2012 .
[7] H. Vincent Poor,et al. Bandit problems with side observations , 2005, IEEE Transactions on Automatic Control.
[8] Deepayan Chakrabarti,et al. Bandits for Taxonomies: A Model-based Approach , 2007, SDM.
[9] Ulf Brefeld,et al. {AUC} maximizing support vector learning , 2005 .
[10] Robert E. Schapire,et al. Non-Stochastic Bandit Slate Problems , 2010, NIPS.
[11] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[12] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[13] Johannes Fürnkranz,et al. Pairwise learning of multilabel classifications with perceptrons , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).
[14] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[15] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.