暂无分享,去创建一个
Srinivas Shakkottai | Ping-Chun Hsieh | I-Hong Hou | Santosh Ganji | Khaled Nakhleh | I.-Hong Hou | S. Shakkottai | Ping-Chun Hsieh | Khaled Nakhleh | Santosh Ganji
[1] Qing Zhao,et al. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks , 2019, Multi-Armed Bandits.
[2] Vivek S. Borkar,et al. Learning Algorithms for Markov Decision Processes with Average Cost , 2001, SIAM J. Control. Optim..
[3] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[4] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .
[5] José Niño-Mora,et al. Dynamic priority allocation via restless bandit marginal productivity indices , 2007, 2304.06115.
[6] Vivek S. Borkar,et al. A reinforcement learning algorithm for restless bandits , 2018, 2018 Indian Control Conference (ICC).
[7] Nan Jiang,et al. Contextual Decision Processes with low Bellman rank are PAC-Learnable , 2016, ICML.
[8] D. Manjunath,et al. On the Whittle Index for Restless Multiarmed Hidden Markov Bandits , 2016, IEEE Transactions on Automatic Control.
[9] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[11] Ling Shi,et al. Deep Reinforcement Learning for Wireless Sensor Scheduling in Cyber-Physical Systems , 2018, Autom..
[12] Tomi Silander,et al. When are Kalman-Filter Restless Bandits Indexable? , 2015, NIPS.
[13] Bhaskar Krishnamachari,et al. Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks , 2018, IEEE Transactions on Cognitive Communications and Networking.
[14] Steffen Grünewälder,et al. Recovering Bandits , 2019, NeurIPS.
[15] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[16] Jasper Snoek,et al. Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling , 2018, ICLR.
[17] Richard Evans,et al. Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.
[18] John N. Tsitsiklis,et al. The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..
[19] E. Feron,et al. Multi-UAV dynamic routing with partial observations using restless bandit allocation indices , 2008, 2008 American Control Conference.
[20] Vivek S. Borkar,et al. A learning algorithm for the Whittle index policy for scheduling web crawlers , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[21] Lang Tong,et al. Deadline scheduling as restless bandits , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[22] Eli Upfal,et al. Adapting to a Changing Environment: the Brownian Restless Bandits , 2008, COLT.
[23] Samuli Aalto,et al. Whittle Index Approach to Size-aware Scheduling with Time-varying Channels , 2015, SIGMETRICS.
[24] V. Borkar,et al. Whittle index based Q-learning for restless bandits with average reward , 2020, Autom..
[25] Dimitri P. Bertsekas,et al. Convergence Results for Some Temporal Difference Methods Based on Least Squares , 2009, IEEE Transactions on Automatic Control.
[26] Pradeep Varakantham,et al. Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare , 2021, IJCAI.
[27] Eytan Modiano,et al. A Whittle Index Approach to Minimizing Functions of Age of Information , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[28] Andriy Mnih,et al. Q-Learning in enormous action spaces via amortized approximate maximization , 2020, ArXiv.
[29] Emma Brunskill,et al. Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs , 2018, ICML.
[30] Peter G. Taylor,et al. Towards Q-learning the Whittle Index for Restless Bandits , 2019, 2019 Australian & New Zealand Control Conference (ANZCC).
[31] Alessandro Lazaric,et al. A single algorithm for both restless and rested rotting bandits , 2020, AISTATS.
[32] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[33] Thorsten Joachims,et al. MOReL : Model-Based Offline Reinforcement Learning , 2020, NeurIPS.
[34] Tianshu Wei,et al. Deep reinforcement learning for building HVAC control , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).