Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles
暂无分享,去创建一个
[1] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[2] Karthik Sridharan,et al. BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits , 2016, ICML.
[3] Noga Alon,et al. Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.
[4] Gergely Neu,et al. Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits , 2020, COLT 2020.
[5] Philip M. Long,et al. Associative Reinforcement Learning using Linear Probabilistic Concepts , 1999, ICML.
[6] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[7] Vladimir Vovk,et al. Metric entropy in competitive on-line prediction , 2006, ArXiv.
[8] Lihong Li,et al. Provable Optimal Algorithms for Generalized Linear Contextual Bandits , 2017, ArXiv.
[9] John Langford,et al. Practical Evaluation and Optimization of Contextual Bandit Algorithms , 2018, ArXiv.
[10] Michael Kearns,et al. Large-Scale Bandit Problems and KWIK Learning , 2013, ICML.
[11] Benjamin Van Roy,et al. Comments on the Du-Kakade-Wang-Yang Lower Bounds , 2019, ArXiv.
[12] Yuhong Yang,et al. Information-theoretic determination of minimax rates of convergence , 1999 .
[13] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[14] Nello Cristianini,et al. Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.
[15] John Langford,et al. Open Problem: First-Order Regret Bounds for Contextual Bandits , 2017, COLT.
[16] Pierre Gaillard,et al. A Chaining Algorithm for Online Nonparametric Regression , 2015, COLT.
[17] Karthik Sridharan,et al. Statistical Learning and Sequential Prediction , 2014 .
[18] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[19] John Langford,et al. Making Contextual Decisions with Low Technical Debt , 2016 .
[20] Karthik Sridharan,et al. Online Nonparametric Regression , 2014, ArXiv.
[21] Koby Crammer,et al. A generalized online mirror descent with applications to classification and regression , 2013, Machine Learning.
[22] Akshay Krishnamurthy,et al. Contextual semibandits via supervised learning oracles , 2015, NIPS.
[23] Ambuj Tewari,et al. On the Universality of Online Mirror Descent , 2011, NIPS.
[24] Thomas J. Walsh,et al. Knows what it knows: a framework for self-aware learning , 2008, ICML '08.
[25] Philip M. Long,et al. Reinforcement Learning with Immediate Rewards and Linear Hypotheses , 2003, Algorithmica.
[26] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[27] Shai Ben-David,et al. Multiclass Learnability and the ERM principle , 2011, COLT.
[28] John Langford,et al. A Contextual Bandit Bake-off , 2018, J. Mach. Learn. Res..
[29] Zheng Wen,et al. New Insights into Bootstrapping for Bandits , 2018, ArXiv.
[30] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[31] Tor Lattimore,et al. Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2020, ICML.
[32] Alexander A. Sherstov,et al. Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).
[33] Sébastien Gerchinovitz,et al. Sparsity Regret Bounds for Individual Sequences in Online Linear Regression , 2011, COLT.
[34] Akshay Krishnamurthy,et al. Contextual bandits with surrogate losses: Margin bounds and efficient algorithms , 2018, NeurIPS.
[35] Eli Upfal,et al. Bandits and Experts in Metric Spaces , 2013, J. ACM.
[36] Karthik Sridharan,et al. Empirical Entropy, Minimax Regret and Minimax Risk , 2013, ArXiv.
[37] S. Mendelson,et al. Entropy and the combinatorial dimension , 2002, math/0203275.
[38] Adam Tauman Kalai,et al. Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression , 2011, NIPS.
[39] Benjamin Van Roy,et al. Eluder Dimension and the Sample Complexity of Optimistic Exploration , 2013, NIPS.
[40] Tor Lattimore,et al. Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits , 2018, ICML.
[41] Amit Daniely,et al. Strongly Adaptive Online Learning , 2015, ICML.
[42] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[43] Yuan Zhou,et al. Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits , 2019, COLT.
[44] Haipeng Luo,et al. Practical Contextual Bandits with Regression Oracles , 2018, ICML.
[45] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[46] Haipeng Luo,et al. Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits , 2016, NIPS.
[47] Ambuj Tewari,et al. From Ads to Interventions: Contextual Bandits in Mobile Health , 2017, Mobile Health - Sensors, Analytic Methods, and Applications.
[48] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[49] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[50] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[51] Vladimir Vovk,et al. A game of prediction with expert advice , 1995, COLT '95.
[52] Aleksandrs Slivkins,et al. Contextual Bandits with Similarity Information , 2009, COLT.
[53] Philip M. Long,et al. Prediction, Learning, Uniform Convergence, and Scale-Sensitive Dimensions , 1998, J. Comput. Syst. Sci..
[54] Vladimir Vovk,et al. Competitive On-line Linear Regression , 1997, NIPS.
[55] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[56] Csaba Szepesvári,et al. Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.
[57] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[58] John Langford,et al. Contextual Bandit Learning with Predictable Rewards , 2012, AISTATS.
[59] Manfred K. Warmuth,et al. Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.
[60] Akshay Krishnamurthy,et al. Efficient Algorithms for Adversarial Contextual Learning , 2016, ICML.
[61] Haipeng Luo,et al. Model selection for contextual bandits , 2019, NeurIPS.
[62] Yuanzhi Li,et al. Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits , 2018, ICML.
[63] Philippe Rigollet,et al. Nonparametric Bandits with Covariates , 2010, COLT.
[64] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[65] Tong Zhang,et al. Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..