Continuous-in-time Limit for Bayesian Bandits
暂无分享,去创建一个
[1] R. Kohn,et al. A New Approach to Drifting Games, Based on Asymptotically Optimal Potentials , 2022, ArXiv.
[2] Vladimir A. Kobzar,et al. A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit , 2022, ArXiv.
[3] Peter W. Glynn,et al. Diffusion Approximations for Thompson Sampling , 2021, ArXiv.
[4] Lexing Ying,et al. A Note on Optimization Formulations of Markov Decision Processes , 2020, 2012.09417.
[5] Grant M. Rotskoff,et al. A Dynamical Central Limit Theorem for Shallow Neural Networks , 2020, NeurIPS.
[6] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[7] Albert Y. Zomaya,et al. Partial Differential Equations , 2007, Explorations in Numerical Analysis.
[8] Rene Caldentey,et al. Diffusion Approximations for a Class of Sequential Experimentation Problems , 2019, Manag. Sci..
[9] Yonatan Gur,et al. Sequential Procurement with Contractual and Experimental Learning , 2019, Manag. Sci..
[10] Matthieu Geist,et al. A Theory of Regularized Markov Decision Processes , 2019, ICML.
[11] E Weinan,et al. A mean-field optimal control formulation of deep learning , 2018, Research in the Mathematical Sciences.
[12] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.
[13] Yeon-Koo Che,et al. Recommender Systems as Mechanisms for Social Learning , 2018 .
[14] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[15] David Simchi-Levi,et al. Online Network Revenue Management Using Thompson Sampling , 2017, Oper. Res..
[16] Qiang Liu,et al. Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.
[17] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[18] E Weinan,et al. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.
[19] Tor Lattimore,et al. Regret Analysis of the Finite-Horizon Gittins Index Strategy for Multi-Armed Bandits , 2015, COLT.
[20] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..
[21] Rong Zheng,et al. Sequential Learning for Multi-Channel Wireless Network Monitoring With Channel Switching Costs , 2014, IEEE Transactions on Signal Processing.
[22] Aurélien Garivier,et al. On Bayesian Upper Confidence Bounds for Bandit Problems , 2012, AISTATS.
[23] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[24] Rina Panigrahy,et al. Prediction strategies without loss , 2010, NIPS.
[25] M. Mohri,et al. Bandit Problems , 2006 .
[26] N. Karoui,et al. Optimal portfolio management with American capital guarantee , 2005 .
[27] Benoît Leloup,et al. Dynamic Pricing on the Internet: Theory and Simulations , 2001, Electron. Commer. Res..
[28] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[29] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .
[30] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .
[31] R. N. Bradt,et al. On Sequential Designs for Maximizing the Sum of $n$ Observations , 1956 .
[32] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[33] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[34] J. Andel. Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.
[35] Kuang Xu,et al. Diffusion Asymptotics for Sequential Experiments , 2021, ArXiv.
[36] Ambuj Tewari,et al. From Ads to Interventions: Contextual Bandits in Mobile Health , 2017, Mobile Health - Sensors, Analytic Methods, and Applications.
[37] Csaba Szepesvari,et al. Regularization in reinforcement learning , 2011 .
[38] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[39] Jati K. Sengupta,et al. Stochastic Control Theory , 1997 .
[40] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[41] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .