Learning Unknown Service Rates in Queues: A Multiarmed Bandit Approach
暂无分享,去创建一个
[1] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.
[2] Urtzi Ayesta,et al. Dynamic Control of Birth-and-Death Restless Bandits: Application to Resource-Allocation Problems , 2016, IEEE/ACM Transactions on Networking.
[3] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.
[4] Demosthenis Teneketzis,et al. Multi-Armed Bandit Problems , 2008 .
[5] Ward Whitt,et al. Heavy Traffic Limit Theorems for Queues: A Survey , 1974 .
[6] Shipra Agrawal,et al. Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.
[7] José Niño-Mora,et al. Marginal productivity index policies for scheduling a multiclass delay-/loss-sensitive queue , 2006, Queueing Syst. Theory Appl..
[8] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .
[9] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[10] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .
[11] José Niño-Mora,et al. Dynamic priority allocation via restless bandit marginal productivity indices , 2007, 2304.06115.
[12] Vianney Perchet,et al. Batched Bandit Problems , 2015, COLT.
[13] Jean-Yves Audibert,et al. Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem , 2013, J. Mach. Learn. Res..
[14] Benjamin Van Roy,et al. Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..
[15] Csaba Szepesvári,et al. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits , 2009, Theor. Comput. Sci..
[16] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[17] Jean Walrand,et al. The c# rule revisited , 1985 .
[18] Vianney Perchet,et al. Bounded regret in stochastic multi-armed bandits , 2013, COLT.
[19] Steven L. Scott,et al. A modern Bayesian look at the multi-armed bandit , 2010 .
[20] Sanjay Shakkottai,et al. Regret of Queueing Bandits , 2016, NIPS.
[21] Demosthenis Teneketzis,et al. ON THE OPTIMALITY OF AN INDEX RULE IN MULTICHANNEL ALLOCATION FOR SINGLE-HOP MOBILE NETWORKS WITH MULTIPLE SERVICE CLASSES , 2000 .
[22] Vianney Perchet,et al. Combinatorial semi-bandit with known covariance , 2016, NIPS.
[23] H. Kushner. Heavy Traffic Analysis of Controlled Queueing and Communication Networks , 2001 .
[24] Alexandre Proutière,et al. Combinatorial Bandits Revisited , 2015, NIPS.
[25] J. V. Mieghem. Dynamic Scheduling with Convex Delay Costs: The Generalized CU Rule , 1995 .
[26] R. Srikant,et al. Bandits with Budgets , 2015, SIGMETRICS.
[27] Peter Auer,et al. Regret bounds for restless Markov bandits , 2012, Theor. Comput. Sci..
[28] José Niño-Mora,et al. Admission and routing of soft real-time jobs to multiclusters: Design and comparison of index policies , 2012, Comput. Oper. Res..
[29] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[30] Urtzi Ayesta,et al. Scheduling of multi-class multi-server queueing systems with abandonments , 2017, J. Sched..
[31] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .
[32] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[33] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.
[34] Lei Ying,et al. Communication Networks - An Optimization, Control, and Stochastic Networks Perspective , 2014 .
[35] José Niño Mora. Marginal productivity index policies for scheduling a multiclass delay-/loss-sensitive queue , 2005 .
[36] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[37] P. Jacko,et al. Congestion control of TCP flows in Internet routers by means of index policy , 2012, Comput. Networks.
[38] Rémi Munos,et al. Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.
[39] Michael J. Neely,et al. Stability and Capacity Regions or Discrete Time Queueing Networks , 2010, ArXiv.