Learning to Interact With Learning Agents
暂无分享,去创建一个
[1] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.
[2] Haipeng Luo,et al. Corralling a Band of Bandit Algorithms , 2016, COLT.
[3] Minyue Fu. Switching Adaptive Control , 2015, Encyclopedia of Systems and Control.
[4] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.
[5] Andreas Krause,et al. Learning User Preferences to Incentivize Exploration in the Sharing Economy , 2017, AAAI.
[6] Csaba Szepesvári,et al. Partial Monitoring - Classification, Regret Bounds, and Algorithms , 2014, Math. Oper. Res..
[7] Ambuj Tewari,et al. Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret , 2012, ICML.
[8] Y. Mansour,et al. Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .
[9] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[10] Andreas Krause,et al. Actively Learning Hemimetrics with Applications to Eliciting User Preferences , 2016, ICML.
[11] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[12] Matthew J. Streeter,et al. Tighter Bounds for Multi-Armed Bandits with Expert Advice , 2009, COLT.
[13] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[14] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[15] Ran El-Yaniv,et al. Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..
[16] Eli Upfal,et al. Adapting to a Changing Environment: the Brownian Restless Bandits , 2008, COLT.
[17] Haipeng Luo,et al. Fast Convergence of Regularized Learning in Games , 2015, NIPS.
[18] Haipeng Luo,et al. Online Gradient Boosting , 2015, NIPS.
[19] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.
[20] Rémi Munos,et al. Adaptive Bandits: Towards the best history-dependent strategy , 2011, AISTATS.
[21] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[22] Y. Mansour,et al. 4 Learning , Regret minimization , and Equilibria , 2006 .
[23] Hsuan-Tien Lin,et al. Active Learning by Learning , 2015, AAAI.
[24] Benjamin Edelman,et al. To Groupon or not to Groupon: The profitability of deep discounts , 2016 .
[25] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[26] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[27] David Haussler,et al. How to use expert advice , 1993, STOC.
[28] Haipeng Luo,et al. Optimal and Adaptive Algorithms for Online Boosting , 2015, ICML.
[29] Omar Besbes,et al. Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-Stationary Rewards , 2014, Stochastic Systems.