暂无分享,去创建一个
Milind Tambe | Arpita Biswas | Lily Xu | Jackson A. Killian | Milind Tambe | Arpita Biswas | Lily Xu | J. Killian
[1] Craig Boutilier,et al. Minimax regret based elicitation of generalized additive utilities , 2007, UAI.
[2] Abhinav Gupta,et al. Robust Adversarial Reinforcement Learning , 2017, ICML.
[3] Bhaskar Krishnamachari,et al. Restless Poachers: Handling Exploration-Exploitation Tradeoffs in Security Domains , 2016, AAMAS.
[4] Diego Ruiz-Hernández,et al. Multi-machine preventive maintenance scheduling with imperfect interventions: A restless bandit approach , 2020, Comput. Oper. Res..
[5] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[6] Archis Ghate,et al. Lagrangian relaxation and constraint generation for allocation and advanced scheduling , 2012, Comput. Oper. Res..
[7] Milind Tambe,et al. Collapsing Bandits and Their Application to Public Health Interventions , 2020, NeurIPS.
[8] NEURWIN: NEURAL WHITTLE INDEX NETWORK FOR , 2020 .
[9] Wouter M. Koolen,et al. Maximin Action Identification: A New Bandit Framework for Games , 2016, COLT.
[10] Yi Wu,et al. Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient , 2019, AAAI.
[11] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .
[12] Reza Yaesoubi,et al. Generalized Markov models of infectious disease spread: A novel framework for developing dynamic health policies , 2011, Eur. J. Oper. Res..
[13] Peter G. Taylor,et al. Towards Q-learning the Whittle Index for Restless Bandits , 2019, 2019 Australian & New Zealand Control Conference (ANZCC).
[14] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[15] Rostislav Horcík,et al. Double Oracle Algorithm for Computing Equilibria in Continuous Games , 2020, AAAI.
[16] K. Glazebrook,et al. General notions of indexability for queueing control and asset management , 2011, 1211.1775.
[17] K. Glazebrook,et al. Some indexable families of restless bandit problems , 2006, Advances in Applied Probability.
[18] John N. Tsitsiklis,et al. The complexity of optimal queueing network control , 1994, Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory.
[19] David Silver,et al. A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.
[20] Kobi Cohen,et al. Learning in Restless Multiarmed Bandits via Adaptive Arm Sequencing Rules , 2021, IEEE Transactions on Automatic Control.
[21] Olivier Spanjaard,et al. A double oracle approach to minmax regret optimization problems with interval data , 2017, Eur. J. Oper. Res..
[22] Umberto Spagnolini,et al. Optimality of myopic scheduling and whittle indexability for energy harvesting sensors , 2012, 2012 46th Annual Conference on Information Sciences and Systems (CISS).
[23] Mariel S. Lavieri,et al. Optimal Screening for Hepatocellular Carcinoma: A Restless Bandit Model , 2019, Manuf. Serv. Oper. Manag..
[24] Fei Fang,et al. Robust Reinforcement Learning Under Minimax Regret for Green Security , 2021, UAI.
[25] Ambuj Tewari,et al. Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems , 2019, NeurIPS.
[26] K. Glazebrook,et al. On the asymptotic optimality of greedy index heuristics for multi-action restless bandits , 2015, Advances in Applied Probability.
[27] Vincent A. Knight,et al. Nashpy: A Python library for the computation of Nash equilibria , 2018, J. Open Source Softw..
[28] R. Weber,et al. On an index policy for restless bandits , 1990, Journal of Applied Probability.
[29] Jeffrey Thomas Hawkins,et al. A Langrangian decomposition approach to weakly coupled dynamic optimization problems and its applications , 2003 .
[30] Avrim Blum,et al. Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.
[31] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.
[32] Milind Tambe,et al. Beyond "To Act or Not to Act": Fast Lagrangian Approaches to General Multi-Action Restless Bandits , 2021, AAMAS.
[33] Arpita Biswas,et al. Q-Learning Lagrange Policies for Multi-Action Restless Bandits , 2021, KDD.
[34] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[35] Bo An,et al. Regret-Based Optimization and Preference Elicitation for Stackelberg Security Games with Uncertainty , 2014, AAAI.
[36] V. Borkar,et al. Whittle index based Q-learning for restless bandits with average reward , 2020, Autom..
[37] Pradeep Varakantham,et al. Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare , 2021, IJCAI.
[38] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.
[39] Lai Wei,et al. Nonstationary Stochastic Multiarmed Bandits: UCB Policies and Minimax Regret , 2021, ArXiv.
[40] Daniel Adelman,et al. Relaxations of Weakly Coupled Stochastic Dynamic Programs , 2008, Oper. Res..