Contextual Bandits under Delayed Feedback
暂无分享,去创建一个
Tor Lattimore | Alexandra Carpentier | Beyza Ermis | Giovanni Zappella | Michael Brueckner | Claire Vernade | A. Carpentier | Claire Vernade | Giovanni Zappella | M. Brueckner | B. Ermiş
[1] Renyuan Xu,et al. Learning in Generalized Linear Contextual Bandits with Stochastic Delays , 2019, NeurIPS.
[2] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[3] Olivier Chapelle,et al. Modeling delayed feedback in display advertising , 2014, KDD.
[4] András György,et al. Online Learning under Delayed Feedback , 2013, ICML.
[5] Claudio Gentile,et al. Nonstochastic Bandits with Composite Anonymous Feedback , 2018, COLT.
[6] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[7] Tor Lattimore,et al. Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits , 2018, ICML.
[8] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[9] András György,et al. Learning from Delayed Outcomes via Proxies with Applications to Recommender Systems , 2018, ICML.
[10] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[11] Li Zhou,et al. A Survey on Contextual Multi-armed Bandits , 2015, ArXiv.
[12] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[13] Alessandro Lazaric,et al. Linear Thompson Sampling Revisited , 2016, AISTATS.
[14] Richard Combes,et al. Stochastic Online Shortest Path Routing: The Value of Feedback , 2013, IEEE Transactions on Automatic Control.
[15] Gergely Neu,et al. Explore no more: Improved high-probability regret bounds for non-stochastic bandits , 2015, NIPS.
[16] Andreas Krause,et al. Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.
[17] Vianney Perchet,et al. Stochastic Bandit Models for Delayed Conversions , 2017, UAI.
[18] Stephen G. Eick,et al. The two-armed bandit with delayed responses , 1988 .
[19] Travis Mandel,et al. Towards More Practical Reinforcement Learning , 2015, IJCAI.
[20] Eustache Diemert,et al. Attribution Modeling Increases Efficiency of Bidding in Display Advertising , 2017, ADKDD@KDD.
[21] András György,et al. Learning from Delayed Outcomes with Intermediate Observations , 2018, ArXiv.
[22] Yuhong Yang,et al. Randomized Allocation with Nonparametric Estimation for Contextual Multi-Armed Bandits with Delayed Rewards , 2019, Statistics & Probability Letters.
[23] Robert D. Nowak,et al. Scalable Generalized Linear Bandits: Online Computation and Hashing , 2017, NIPS.
[24] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[25] Stefano Ermon,et al. Best arm identification in multi-armed bandits with delayed feedback , 2018, AISTATS.
[26] Yuya Yoshikawa,et al. A Nonparametric Delayed Feedback Model for Conversion Rate Prediction , 2018, ArXiv.
[27] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[28] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[29] Csaba Szepesvári,et al. Bandits with Delayed, Aggregated Anonymous Feedback , 2017, ICML.
[30] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[31] Csaba Szepesvári,et al. Bandits with Delayed Anonymous Feedback , 2017, ArXiv.
[32] Georgios B. Giannakis,et al. Bandit Online Learning with Unknown Delays , 2018, AISTATS.