Stochastic Contextual Bandits with Long Horizon Rewards
暂无分享,去创建一个
[1] Robert D. Kleinberg,et al. Online Convex Optimization with Unbounded Memory , 2022, ArXiv.
[2] S. Filippi,et al. Delayed Feedback in Generalised Linear Bandits Revisited , 2022, AISTATS.
[3] Samet Oymak,et al. Representation Learning for Context-Dependent Decision-Making , 2022, 2022 American Control Conference (ACC).
[4] Samet Oymak,et al. Non-Stationary Representation Learning in Sequential Linear Bandits , 2022, IEEE Open Journal of Control Systems.
[5] Cheng Soon Ong,et al. Gaussian Process Bandits with Aggregated Feedback , 2021, AAAI.
[6] Yishay Mansour,et al. Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions , 2021, ICML.
[7] Pieter Abbeel,et al. Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.
[8] Tor Lattimore,et al. Information Directed Sampling for Sparse Linear Bandits , 2021, NeurIPS.
[9] Longbo Huang,et al. Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback , 2020, AAAI.
[10] Tor Lattimore,et al. High-Dimensional Sparse Linear Bandits , 2020, NeurIPS.
[11] A. Proutière,et al. Thresholded LASSO Bandit , 2020, ICML.
[12] Pooria Joulani,et al. Adapting to Delays and Data in Adversarial Multi-Armed Bandits , 2020, ICML.
[13] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .
[14] Zhengyuan Zhou,et al. Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits , 2020, Management Science.
[15] Min-hwan Oh,et al. Sparsity-Agnostic Lasso Bandit , 2020, ICML.
[16] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[17] Michal Valko,et al. Stochastic bandits with arm-dependent delays , 2020, ICML.
[18] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[19] Soon-Jo Chung,et al. Online Optimization with Memory and Competitive Control , 2020, NeurIPS.
[20] N. Cesa-Bianchi,et al. Stochastic Bandits with Delay-Dependent Payoffs , 2019, AISTATS.
[21] Aditya Kumar Akash,et al. Stochastic Bandits with Delayed Composite Anonymous Feedback , 2019, ArXiv.
[22] Julian Zimmert,et al. An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays , 2019, AISTATS.
[23] Gi-Soo Kim,et al. Doubly-Robust Lasso Bandit , 2019, NeurIPS.
[24] Nicolò Cesa-Bianchi,et al. Nonstochastic Multiarmed Bandits with Unrestricted Delays , 2019, NeurIPS.
[25] Michael I. Jordan,et al. A Short Note on Concentration Inequalities for Random Vectors with SubGaussian Norm , 2019, ArXiv.
[26] Georgios B. Giannakis,et al. Bandit Online Learning with Unknown Delays , 2018, AISTATS.
[27] Tor Lattimore,et al. Linear bandits with Stochastic Delayed Feedback , 2018, ICML.
[28] Claudio Gentile,et al. Nonstochastic Bandits with Composite Anonymous Feedback , 2018, COLT.
[29] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[30] Csaba Szepesvári,et al. Bandits with Delayed, Aggregated Anonymous Feedback , 2017, ICML.
[31] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[32] Lihong Li,et al. Provable Optimal Algorithms for Generalized Linear Contextual Bandits , 2017, ArXiv.
[33] Claudio Gentile,et al. Delay and Cooperation in Nonstochastic Bandits , 2016, COLT.
[34] Justin K. Romberg,et al. An Overview of Low-Rank Matrix Recovery From Incomplete Observations , 2016, IEEE Journal of Selected Topics in Signal Processing.
[35] Shie Mannor,et al. Online Learning for Adversaries with Memory: Price of Past Mistakes , 2015, NIPS.
[36] Mohsen Bayati,et al. Online Decision-Making with High-Dimensional Covariates , 2015 .
[37] R. Adamczak,et al. A note on the Hanson-Wright inequality for random vectors with dependencies , 2014, 1409.8457.
[38] Holger Rauhut,et al. A Mathematical Introduction to Compressive Sensing , 2013, Applied and Numerical Harmonic Analysis.
[39] M. Rudelson,et al. Hanson-Wright inequality and sub-gaussian concentration , 2013 .
[40] Yonina C. Eldar,et al. Simultaneously Structured Models With Application to Sparse and Low-Rank Matrices , 2012, IEEE Transactions on Information Theory.
[41] Holger Rauhut,et al. Suprema of Chaos Processes and the Restricted Isometry Property , 2012, ArXiv.
[42] Nicolas Vayatis,et al. Estimation of Simultaneously Sparse and Low Rank Matrices , 2012, ICML.
[43] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[44] Csaba Szepesvári,et al. Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.
[45] Rémi Munos,et al. Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.
[46] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[47] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[48] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[49] Massimo Fornasier,et al. Theoretical Foundations and Numerical Methods for Sparse Recovery , 2010, Radon Series on Computational and Applied Mathematics.
[50] P. Bickel,et al. SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.
[51] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[52] E. Candès,et al. The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.
[53] M. Woodroofe. A One-Armed Bandit Problem with a Concomitant Variable , 1979 .
[54] P. Wedin. Perturbation bounds in connection with singular value decomposition , 1972 .
[55] Ken-ichi Kawarabayashi,et al. Delay and Cooperation in Nonstochastic Linear Bandits , 2020, NeurIPS.
[56] Xi Chen,et al. Online EXP3 Learning in Adversarial Bandits with Delayed Feedback , 2019, NeurIPS.
[57] Renyuan Xu,et al. Learning in Generalized Linear Contextual Bandits with Stochastic Delays , 2019, NeurIPS.
[58] Xue Wang,et al. Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Convariates , 2018, ICML.