Offline Neural Contextual Bandits: Pessimism, Optimization and Generalization
暂无分享,去创建一个
[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[2] Svetha Venkatesh,et al. On Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks in Besov Spaces , 2021, Trans. Mach. Learn. Res..
[3] Xinkun Nie,et al. Learning When-to-Treat Policies , 2019, Journal of the American Statistical Association.
[4] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[5] Banghua Zhu,et al. Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism , 2021, IEEE Transactions on Information Theory.
[6] Lihong Li,et al. Learning from Logged Implicit Exploration Data , 2010, NIPS.
[7] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[8] Quanquan Gu,et al. Neural Contextual Bandits with UCB-based Exploration , 2019, ICML.
[9] Boris Hanin,et al. Finite Depth and Width Corrections to the Neural Tangent Kernel , 2019, ICLR.
[10] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[11] Doina Precup,et al. Off-Policy Deep Reinforcement Learning without Exploration , 2018, ICML.
[12] John Langford,et al. Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.
[13] Alessandro Lazaric,et al. Leveraging Good Representations in Linear Contextual Bandits , 2021, ICML.
[14] Chi Jin,et al. Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning , 2021, ICML.
[15] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.
[16] Yu Bai,et al. Near Optimal Provable Uniform Convergence in Off-Policy Evaluation for Reinforcement Learning , 2021, AISTATS.
[17] Andreas Krause,et al. Contextual Gaussian Process Bandit Optimization , 2011, NIPS.
[18] Nan Jiang,et al. Information-Theoretic Considerations in Batch Reinforcement Learning , 2019, ICML.
[19] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[20] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[21] Csaba Szepesvári,et al. Finite-Time Bounds for Fitted Value Iteration , 2008, J. Mach. Learn. Res..
[22] Handong Zhao,et al. Neural Contextual Bandits with Deep Representation and Shallow Exploration , 2020, ICLR.
[23] Mengdi Wang,et al. Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation , 2020, ICML.
[24] Csaba Szepesvári,et al. –armed Bandits , 2022 .
[25] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[26] Svetha Venkatesh,et al. Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support , 2021, ArXiv.
[27] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[28] Toru Kitagawa,et al. Who should be Treated? Empirical Welfare Maximization Methods for Treatment Choice , 2015 .
[29] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[30] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[31] Quynh Nguyen,et al. On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths , 2021, ICML.
[32] Martin A. Riedmiller,et al. Batch Reinforcement Learning , 2012, Reinforcement Learning.
[33] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.
[34] Michael I. Jordan,et al. On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces , 2021 .
[35] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[36] S. Levine,et al. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems , 2020, ArXiv.
[37] Fredrik D. Johansson,et al. Guidelines for reinforcement learning in healthcare , 2019, Nature Medicine.
[38] Yu-Xiang Wang,et al. Characterizing Uniform Convergence in Offline Policy Evaluation via model-based approach: Offline Learning, Task-Agnostic and Reward-Free , 2021 .
[39] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[40] Yu-Xiang Wang,et al. Asymptotically Efficient Off-Policy Evaluation for Tabular Reinforcement Learning , 2020, AISTATS.
[41] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[42] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[43] Nello Cristianini,et al. Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.
[44] Masatoshi Uehara,et al. Fast Rates for the Regret of Offline Reinforcement Learning , 2021, COLT.
[45] Mikhail Belkin,et al. Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation , 2021, Acta Numerica.
[46] Rajesh Ranganath,et al. Bandit Overfitting in Offline Policy Learning. , 2020 .
[47] Philip S. Thomas,et al. Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing , 2017, AAAI.
[48] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[49] Tor Lattimore,et al. On the Optimality of Batch Policy Optimization Algorithms , 2021, ICML.
[50] Stefan Wager,et al. Policy Learning With Observational Data , 2017, Econometrica.
[51] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .
[52] Zhuoran Yang,et al. Is Pessimism Provably Efficient for Offline RL? , 2020, ICML.