暂无分享,去创建一个
Quanquan Gu | Dongruo Zhou | Lihong Li | Weitong Zhang | Lihong Li | Quanquan Gu | Dongruo Zhou | Weitong Zhang
[1] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[2] Jianfeng Gao,et al. BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems , 2016, AAAI.
[3] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[4] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[5] Lihong Li,et al. An Empirical Evaluation of Thompson Sampling , 2011, NIPS.
[6] Julien Mairal,et al. On the Inductive Bias of Neural Tangent Kernels , 2019, NeurIPS.
[7] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[8] Yuan Cao,et al. Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.
[9] Julien Cornebise,et al. Weight Uncertainty in Neural Networks , 2015, ArXiv.
[10] Long Tran-Thanh,et al. Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.
[11] Lihong Li,et al. Provable Optimal Algorithms for Generalized Linear Contextual Bandits , 2017, ArXiv.
[12] Aurélien Garivier,et al. Parametric Bandits: The Generalized Linear Case , 2010, NIPS.
[13] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[14] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[15] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[16] Benjamin Van Roy,et al. A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..
[17] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.
[18] Shie Mannor,et al. Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching , 2019, ArXiv.
[19] Nello Cristianini,et al. Finite-Time Analysis of Kernelised Contextual Bandits , 2013, UAI.
[20] Robert D. Nowak,et al. Scalable Generalized Linear Bandits: Online Computation and Hashing , 2017, NIPS.
[21] Rémi Munos,et al. Spectral Thompson Sampling , 2014, AAAI.
[22] Julien Cornebise,et al. Weight Uncertainty in Neural Network , 2015, ICML.
[23] Yuan Cao,et al. Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks , 2019, NeurIPS.
[24] Alessandro Lazaric,et al. Linear Thompson Sampling Revisited , 2016, AISTATS.
[25] Benjamin Van Roy,et al. Deep Exploration via Bootstrapped DQN , 2016, NIPS.
[26] Mengdi Wang,et al. Reinforcement Leaning in Feature Space: Matrix Bandit, Kernels, and Regret Bound , 2019, ICML.
[27] Yuan Cao,et al. Towards Understanding the Spectral Bias of Deep Learning , 2021, IJCAI.
[28] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[29] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[30] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[31] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[32] Hsiu-Chin Lin,et al. Learning task constraints in operational space formulation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).
[33] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[34] 脇元 修一,et al. IEEE International Conference on Robotics and Automation (ICRA) におけるフルードパワー技術の研究動向 , 2011 .
[35] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.
[36] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[37] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[38] Quanquan Gu,et al. Neural Contextual Bandits with UCB-based Exploration , 2019, ICML.
[39] Alexander Rakhlin,et al. Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles , 2020, ICML.
[40] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[41] Matthew W. Hoffman,et al. Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization , 2013, 1303.6746.
[42] Yuanzhi Li,et al. What Can ResNet Learn Efficiently, Going Beyond Kernels? , 2019, NeurIPS.
[43] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[44] Aditya Gopalan,et al. On Kernelized Multi-armed Bandits , 2017, ICML.
[45] Mathieu Aubry,et al. Dex-Net 1.0: A cloud-based network of 3D objects for robust grasp planning using a Multi-Armed Bandit model with correlated rewards , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[46] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[47] Yoshua Bengio,et al. Boosting Neural Networks , 2000, Neural Computation.
[48] Kamyar Azizzadenesheli,et al. Efficient Exploration Through Bayesian Deep Q-Networks , 2018, 2018 Information Theory and Applications Workshop (ITA).
[49] Benjamin Van Roy,et al. Bootstrapped Thompson Sampling and Deep Exploration , 2015, ArXiv.
[50] Kristjan H. Greenewald,et al. Action Centered Contextual Bandits , 2017, NIPS.
[51] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[52] Joaquin Quiñonero Candela,et al. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.
[53] Craig Boutilier,et al. Randomized Exploration in Generalized Linear Bandits , 2019, AISTATS.
[54] Shane Legg,et al. Noisy Networks for Exploration , 2017, ICLR.
[55] Benjamin Van Roy,et al. Ensemble Sampling , 2017, NIPS.