暂无分享,去创建一个
[1] Peter Auer,et al. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem , 2010, Period. Math. Hung..
[2] Qing Zhao,et al. Adaptive shortest-path routing under unknown and stochastically varying link states , 2012, 2012 10th International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt).
[3] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.
[4] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[5] Tamir Hazan,et al. Tight Bounds for Bandit Combinatorial Optimization , 2017, COLT.
[6] Gábor Lugosi,et al. Minimax Policies for Combinatorial Prediction Games , 2011, COLT.
[7] Alessandro Lazaric,et al. Linear Thompson Sampling Revisited , 2016, International Conference on Artificial Intelligence and Statistics.
[8] Yasin Abbasi-Yadkori. Forced-Exploration Based Algorithms for Playing in Stochastic Linear Bandits , 2009 .
[9] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[10] Kin K. Leung,et al. Identifiability of link metrics based on end-to-end path measurements , 2013, Internet Measurement Conference.
[11] Gábor Lugosi,et al. Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..
[12] Elad Hazan,et al. Volumetric Spanners: An Efficient Exploration Basis for Learning , 2013, J. Mach. Learn. Res..
[13] Baruch Awerbuch,et al. Online linear optimization and adaptive routing , 2008, J. Comput. Syst. Sci..
[14] Wei Chen,et al. Combinatorial multi-armed bandit: general framework, results and applications , 2013, ICML 2013.
[15] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[16] Aleksandrs Slivkins,et al. Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..
[17] Richard Bellman,et al. ON A ROUTING PROBLEM , 1958 .
[18] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .
[19] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[20] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[21] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.
[22] Alessandro Lazaric,et al. Linear Thompson Sampling Revisited , 2016, AISTATS.
[23] Csaba Szepesvári,et al. Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.
[24] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[25] John N. Tsitsiklis,et al. Linearly Parameterized Bandits , 2008, Math. Oper. Res..
[26] Wei Chu,et al. Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.
[27] Edsger W. Dijkstra,et al. A note on two problems in connexion with graphs , 1959, Numerische Mathematik.
[28] Sébastien Bubeck,et al. The entropic barrier: a simple and optimal universal self-concordant barrier , 2014, COLT.
[29] Jianqing Fan,et al. High-Dimensional Statistics , 2014 .
[30] Zheng Wen,et al. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2014, AISTATS.
[31] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[32] Shipra Agrawal,et al. Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.
[33] Richard Combes,et al. Stochastic Online Shortest Path Routing: The Value of Feedback , 2013, IEEE Transactions on Automatic Control.