暂无分享,去创建一个
[1] I. M. Verloop. Asymptotically optimal priority policies for indexable and nonindexable restless bandits , 2016, 1609.00563.
[2] Ambuj Tewari,et al. Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems , 2019, NeurIPS.
[3] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .
[4] E. Altman. Constrained Markov Decision Processes , 1999 .
[5] Jian Li,et al. Learning Augmented Index Policy for Optimal Service Placement at the Network Edge , 2021, ArXiv.
[6] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[7] Yishay Mansour,et al. Online Convex Optimization in Adversarial Markov Decision Processes , 2019, ICML.
[8] Lang Tong,et al. Deadline scheduling as restless bandits , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[9] Christian Timmerer,et al. Dynamic adaptive streaming over HTTP dataset , 2012, MMSys '12.
[10] Milind Tambe,et al. Beyond "To Act or Not to Act": Fast Lagrangian Approaches to General Multi-Action Restless Bandits , 2021, AAMAS.
[11] Longbo Huang,et al. Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits , 2020, NeurIPS.
[12] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[13] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[14] P. R. Kumar,et al. Reward Biased Maximum Likelihood Estimation for Reinforcement Learning , 2021, L4DC.
[15] Shie Mannor,et al. Exploration-Exploitation in Constrained MDPs , 2020, ArXiv.
[16] Mingyan Liu,et al. Data-Driven Channel Modeling Using Spectrum Measurement , 2015, IEEE Transactions on Mobile Computing.
[17] Panganamala Ramana Kumar,et al. Optimizing quality of experience of dynamic video streaming over fading wireless networks , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).
[18] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[19] Demosthenis Teneketzis,et al. Multi-Armed Bandit Problems , 2008 .
[20] Albert N. Shiryaev,et al. Optimal Stopping Rules , 2011, International Encyclopedia of Statistical Science.
[21] Ambuj Tewari,et al. Thompson Sampling in Non-Episodic Restless Bandits , 2019, ArXiv.
[22] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[23] Srinivas Shakkottai,et al. Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs , 2021, AAAI.
[24] L. Kallenberg. Finite State and Action MDPS , 2003 .
[25] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[26] Dimitris Bertsimas,et al. Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic , 2000, Oper. Res..
[27] P. Schrimpf,et al. Dynamic Programming , 2011 .
[28] Qing Zhao,et al. Logarithmic weak regret of non-Bayesian restless multi-armed bandit , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[30] John N. Tsitsiklis,et al. The complexity of optimal queueing network control , 1994, Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory.
[31] José Niño Mora. Restless Bandits, Partial Conservation Laws and Indexability , 2000 .
[32] Haipeng Luo,et al. Learning Adversarial MDPs with Bandit Feedback and Unknown Transition , 2019, ArXiv.
[33] Qing Zhao,et al. Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics , 2010, IEEE Transactions on Information Theory.
[34] Gabriel Zayas-Cabán,et al. An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits , 2019, Advances in Applied Probability.
[35] Sarang Deo,et al. Improving Health Outcomes Through Better Capacity Allocation in a Community-Based Chronic Care Model , 2013, Oper. Res..
[36] Peter Auer,et al. Regret bounds for restless Markov bandits , 2012, Theor. Comput. Sci..
[37] Mingyan Liu,et al. Optimality of Myopic Sensing in Multi-Channel Opportunistic Access , 2008, 2008 IEEE International Conference on Communications.
[38] R. Weber,et al. On an index policy for restless bandits , 1990, Journal of Applied Probability.
[39] Peter I. Frazier,et al. Restless Bandits with Many Arms: Beating the Central Limit Theorem , 2021, ArXiv.
[40] P. Frazier,et al. An Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits , 2017, 1707.00205.
[41] Mingyan Liu,et al. Online Learning of Rested and Restless Bandits , 2011, IEEE Transactions on Information Theory.
[42] Andrew Perrault,et al. Risk-Aware Interventions in Public Health: Planning with Restless Multi-Armed Bandits , 2021, AAMAS.
[43] O. Hernández-Lerma,et al. Further topics on discrete-time Markov control processes , 1999 .
[44] José Niño-Mora,et al. Dynamic priority allocation via restless bandit marginal productivity indices , 2007, 2304.06115.
[45] David K. Smith,et al. Dynamic Programming and Optimal Control. Volume 1 , 1996 .
[46] David B. Brown,et al. Index Policies and Performance Bounds for Dynamic Selection Problems , 2020, Manag. Sci..
[47] Pierluigi Nuzzo,et al. A Sample-Efficient Algorithm for Episodic Finite-Horizon MDP with Constraints , 2020, AAAI.
[48] Mariel S. Lavieri,et al. Optimal Screening for Hepatocellular Carcinoma: A Restless Bandit Model , 2019, Manuf. Serv. Oper. Manag..
[49] Mingyan Liu,et al. Adaptive learning of uncontrolled restless bandits with logarithmic regret , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[50] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[51] Wenhan Dai,et al. The non-Bayesian restless multi-armed bandit: A case of near-logarithmic regret , 2010, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).