Adapting to Misspecification in Contextual Bandits
暂无分享,去创建一个
Julian Zimmert | Claudio Gentile | Dylan J. Foster | Mehryar Mohri | M. Mohri | C. Gentile | Julian Zimmert
[1] Haipeng Luo,et al. Efficient Contextual Bandits in Non-stationary Worlds , 2017, COLT.
[2] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.
[3] Assaf Zeevi,et al. Upper Counterfactual Confidence Bounds: a New Optimism Principle for Contextual Bandits , 2020, ArXiv.
[4] Anupam Gupta,et al. Better Algorithms for Stochastic Bandits with Adversarial Corruptions , 2019, COLT.
[5] Alkis Gotovos,et al. Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.
[6] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[7] Ruosong Wang,et al. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? , 2020, ICLR.
[8] Philip M. Long,et al. Associative Reinforcement Learning using Linear Probabilistic Concepts , 1999, ICML.
[9] Philip M. Long,et al. Reinforcement Learning with Immediate Rewards and Linear Hypotheses , 2003, Algorithmica.
[10] Aditya Gopalan,et al. Misspecified Linear Bandits , 2017, AAAI.
[11] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[12] Michael J. Todd,et al. On Khachiyan's algorithm for the computation of minimum-volume enclosing ellipsoids , 2007, Discret. Appl. Math..
[13] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.
[14] David Simchi-Levi,et al. Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits under Realizability , 2020, Math. Oper. Res..
[15] Tor Lattimore,et al. Learning with Good Feature Representations in Bandits and in RL with a Generative Model , 2020, ICML.
[16] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[17] Pierre Gaillard,et al. A Chaining Algorithm for Online Nonparametric Regression , 2015, COLT.
[18] Andreas Krause,et al. Corruption-Tolerant Gaussian Process Bandit Optimization , 2020, AISTATS.
[19] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[20] Piyush Kumar,et al. Minimum-Volume Enclosing Ellipsoids and Core Sets , 2005 .
[21] Ambuj Tewari,et al. From Ads to Interventions: Contextual Bandits in Mobile Health , 2017, Mobile Health - Sensors, Analytic Methods, and Applications.
[22] John Langford,et al. Making Contextual Decisions with Low Technical Debt , 2016 .
[23] Haipeng Luo,et al. Model selection for contextual bandits , 2019, NeurIPS.
[24] Julian Zimmert,et al. Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits , 2018, J. Mach. Learn. Res..
[25] Andreas Krause,et al. High-Dimensional Gaussian Process Bandits , 2013, NIPS.
[26] John Langford,et al. Efficient Optimal Learning for Contextual Bandits , 2011, UAI.
[27] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[28] Alessandro Lazaric,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[29] Renato Paes Leme,et al. Stochastic bandits robust to adversarial corruptions , 2018, STOC.
[30] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.
[31] Haipeng Luo,et al. Practical Contextual Bandits with Regression Oracles , 2018, ICML.
[32] Alexander Rakhlin,et al. Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles , 2020, ICML.
[33] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[34] Julian Zimmert,et al. Model Selection in Contextual Stochastic Bandit Problems , 2020, NeurIPS.
[35] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[36] Csaba Szepesvári,et al. Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.
[37] John Langford,et al. Contextual Bandit Learning with Predictable Rewards , 2012, AISTATS.
[38] Ambuj Tewari,et al. Fighting Bandits with a New Kind of Smoothness , 2015, NIPS.
[39] Akshay Krishnamurthy,et al. Efficient Algorithms for Adversarial Contextual Learning , 2016, ICML.
[40] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[41] Leonid Khachiyan,et al. On the complexity of approximating the maximal inscribed ellipsoid for a polytope , 1993, Math. Program..
[42] Koby Crammer,et al. Multiclass classification with bandit feedback using adaptive regularization , 2012, Machine Learning.
[43] Adam Tauman Kalai,et al. Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression , 2011, NIPS.
[44] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[45] David Simchi-Levi,et al. Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective , 2020, COLT.
[46] Haipeng Luo,et al. Corralling a Band of Bandit Algorithms , 2016, COLT.
[47] Vasilis Syrgkanis,et al. Semi-Parametric Efficient Policy Learning with Continuous Actions , 2019, NeurIPS.
[48] Nello Cristianini,et al. Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.