暂无分享,去创建一个
Alessandro Lazaric | Marcello Restelli | Andrea Tirinzoni | A. Lazaric | Marcello Restelli | Andrea Tirinzoni
[1] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[2] Tor Lattimore,et al. The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits , 2016, AISTATS.
[3] Alexandre Proutière,et al. Learning to Rank , 2015, SIGMETRICS.
[4] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.
[5] Tor Lattimore,et al. Bounded Regret for Finite-Armed Structured Bandits , 2014, NIPS.
[6] Wouter M. Koolen,et al. Non-Asymptotic Pure Exploration by Solving Games , 2019, NeurIPS.
[7] Cong Shen,et al. Regional Multi-Armed Bandits , 2018, AISTATS.
[8] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .
[9] T. L. Graves,et al. Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .
[10] Alexandre Proutière,et al. Minimal Exploration in Structured Stochastic Bandits , 2017, NIPS.
[11] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[12] R. Agrawal,et al. Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space , 1989 .
[13] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..
[14] Vianney Perchet,et al. Bandits with Side Observations: Bounded vs. Logarithmic Regret , 2018, UAI.
[15] Tor Lattimore,et al. Adaptive Exploration in Linear Contextual Bandit , 2020, AISTATS.
[16] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[17] Alessandro Lazaric,et al. Linear Thompson Sampling Revisited , 2016, AISTATS.
[18] Samarth Gupta,et al. Exploiting Correlation in Finite-Armed Structured Bandits , 2018, ArXiv.
[19] Pierre Ménard,et al. Gradient Ascent for Active Exploration in Bandit Problems , 2019, ArXiv.
[20] Shie Mannor,et al. Unimodal Bandits , 2011, ICML.
[21] Onur Atan,et al. Global Bandits , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[22] Alexandre Proutière,et al. Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms , 2014, COLT.
[23] Vianney Perchet,et al. Bounded regret in stochastic multi-armed bandits , 2013, COLT.
[24] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[25] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[26] Aurélien Garivier,et al. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems , 2016, Math. Oper. Res..
[27] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[28] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[29] Alessandro Lazaric,et al. Sequential Transfer in Multi-armed Bandit with Finite Set of Models , 2013, NIPS.