暂无分享,去创建一个
[1] Satyen Kale,et al. Multiarmed Bandits With Limited Expert Advice , 2013, COLT.
[2] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[3] Haipeng Luo,et al. Online Gradient Boosting , 2015, NIPS.
[4] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.
[5] Eli Upfal,et al. Adapting to a Changing Environment: the Brownian Restless Bandits , 2008, COLT.
[6] Rémi Munos,et al. Adaptive Bandits: Towards the best history-dependent strategy , 2011, AISTATS.
[7] Hsuan-Tien Lin,et al. Active Learning by Learning , 2015, AAAI.
[8] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[9] Haipeng Luo,et al. Corralling a Band of Bandit Algorithms , 2016, COLT.
[10] Sonia Jaffe,et al. To Groupon or not to Groupon: The profitability of deep discounts , 2014 .
[11] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[12] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[13] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[14] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[15] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.
[16] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[17] Haipeng Luo,et al. Fast Convergence of Regularized Learning in Games , 2015, NIPS.
[18] Omar Besbes,et al. Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-Stationary Rewards , 2014, Stochastic Systems.
[19] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .
[20] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[21] Matthew J. Streeter,et al. Tighter Bounds for Multi-Armed Bandits with Expert Advice , 2009, COLT.
[22] David Haussler,et al. How to use expert advice , 1993, STOC.
[23] Csaba Szepesvári,et al. Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..
[24] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[25] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[26] Andreas Krause,et al. Actively Learning Hemimetrics with Applications to Eliciting User Preferences , 2016, ICML.
[27] Ran El-Yaniv,et al. Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..
[28] Ambuj Tewari,et al. Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.
[29] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[30] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..