暂无分享,去创建一个
Yishay Mansour | Aviv Rosenberg | Aviv A. Rosenberg | Tal Lancewicki | Y. Mansour | Tal Lancewicki | Aviv Rosenberg
[1] Yishay Mansour,et al. Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions , 2021, ICML.
[2] Max Simchowitz,et al. Exploration and Incentives in Reinforcement Learning , 2021, ArXiv.
[3] Haipeng Luo,et al. Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case , 2021, ICML.
[4] Haipeng Luo,et al. Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition , 2020, COLT.
[5] Pooria Joulani,et al. Adapting to Delays and Data in Adversarial Multi-Armed Bandits , 2020, ICML.
[6] Aleksandrs Slivkins,et al. Corruption Robust Exploration in Episodic Reinforcement Learning , 2019, COLT.
[7] Quanquan Gu,et al. Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation , 2021, ArXiv.
[8] Haipeng Luo,et al. Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition , 2020, ICML.
[9] Yishay Mansour,et al. Adversarial Stochastic Shortest Path , 2020, ArXiv.
[10] Michal Valko,et al. Stochastic bandits with arm-dependent delays , 2020, ICML.
[11] Haipeng Luo,et al. Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition , 2020, NeurIPS.
[12] Baiming Chen,et al. Delay-Aware Multi-Agent Reinforcement Learning , 2020, ArXiv.
[13] Mykel J. Kochenderfer,et al. Learning Near Optimal Policies with Low Inherent Bellman Error , 2020, ICML.
[14] Haim Kaplan,et al. Near-optimal Regret Bounds for Stochastic Shortest Path , 2020, ICML.
[15] Shie Mannor,et al. Optimistic Policy Optimization with Bandit Feedback , 2020, ICML.
[16] Csaba Szepesvári,et al. A modular analysis of adaptive (non-)convex optimization: Optimism, composite objectives, variance reduction, and variational bounds , 2020, Theor. Comput. Sci..
[17] Chi Jin,et al. Provably Efficient Exploration in Policy Optimization , 2019, ICML.
[18] Alessandro Lazaric,et al. No-Regret Exploration in Goal-Oriented Reinforcement Learning , 2019, ICML.
[19] Alessandro Lazaric,et al. Frequentist Regret Bounds for Randomized Least-Squares Value Iteration , 2019, AISTATS.
[20] Julian Zimmert,et al. An Optimal Algorithm for Adversarial Bandits with Arbitrary Delays , 2019, AISTATS.
[21] Michael I. Jordan,et al. Provably Efficient Reinforcement Learning with Linear Function Approximation , 2019, COLT.
[22] Nicolò Cesa-Bianchi,et al. Nonstochastic Multiarmed Bandits with Unrestricted Delays , 2019, NeurIPS.
[23] Yishay Mansour,et al. Online Convex Optimization in Adversarial Markov Decision Processes , 2019, ICML.
[24] Max Simchowitz,et al. Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs , 2019, NeurIPS.
[25] Mengdi Wang,et al. Sample-Optimal Parametric Q-Learning Using Linearly Additive Features , 2019, ICML.
[26] Emma Brunskill,et al. Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds , 2019, ICML.
[27] Xi Chen,et al. Online EXP3 Learning in Adversarial Bandits with Delayed Feedback , 2019, NeurIPS.
[28] Yishay Mansour,et al. Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function , 2019, NeurIPS.
[29] Renyuan Xu,et al. Learning in Generalized Linear Contextual Bandits with Stochastic Delays , 2019, NeurIPS.
[30] Michael I. Jordan,et al. Is Q-learning Provably Efficient? , 2018, NeurIPS.
[31] Emma Brunskill,et al. Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs , 2018, ICML.
[32] Claudio Gentile,et al. Nonstochastic Bandits with Composite Anonymous Feedback , 2018, COLT.
[33] James Bergstra,et al. Setting up a Reinforcement Learning Task with a Real-World Robot , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
[34] Alessandro Lazaric,et al. Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning , 2018, ICML.
[35] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.
[36] Csaba Szepesvári,et al. Bandits with Delayed, Aggregated Anonymous Feedback , 2017, ICML.
[37] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[38] Vianney Perchet,et al. Stochastic Bandit Models for Delayed Conversions , 2017, UAI.
[39] Rémi Munos,et al. Minimax Regret Bounds for Reinforcement Learning , 2017, ICML.
[40] Tor Lattimore,et al. Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning , 2017, NIPS.
[41] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[42] Benjamin Van Roy,et al. On Lower Bounds for Regret in Reinforcement Learning , 2016, ArXiv.
[43] Claudio Gentile,et al. Delay and Cooperation in Nonstochastic Bandits , 2016, COLT.
[44] Benjamin Van Roy,et al. Generalization and Exploration via Randomized Value Functions , 2014, ICML.
[45] Kent Quanrud,et al. Online Learning with Adversarial Delays , 2015, NIPS.
[46] Peter Xiaoping Liu,et al. Impact of Communication Delays on Secondary Frequency Control in an Islanded Microgrid , 2015, IEEE Transactions on Industrial Electronics.
[47] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.
[48] Csaba Szepesvári,et al. Online Markov Decision Processes Under Bandit Feedback , 2010, IEEE Transactions on Automatic Control.
[49] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.
[50] András György,et al. Online Learning under Delayed Feedback , 2013, ICML.
[51] Alexander Zimin. Online Learning in Markovian Decision Processes , 2013 .
[52] Bessem Sayadi,et al. Online learning for QoE-based video streaming to mobile receivers , 2012, 2012 IEEE Globecom Workshops.
[53] András György,et al. The adversarial stochastic shortest path problem with unknown transition probabilities , 2012, AISTATS.
[54] Robert Babuska,et al. Control delay in Reinforcement Learning for real-time dynamic systems: A memoryless approach , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[55] Csaba Szepesvari,et al. The Online Loop-free Stochastic Shortest-Path Problem , 2010, Annual Conference Computational Learning Theory.
[56] Yishay Mansour,et al. Online Markov Decision Processes , 2009, Math. Oper. Res..
[57] Massimiliano Pontil,et al. Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.
[58] Ambuj Tewari,et al. REGAL: A Regularization based Algorithm for Reinforcement Learning in Weakly Communicating MDPs , 2009, UAI.
[59] Thomas J. Walsh,et al. Learning and planning in environments with delayed feedback , 2009, Autonomous Agents and Multi-Agent Systems.
[60] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[61] Shie Mannor,et al. Markov Decision Processes with Arbitrary Reward Processes , 2008, Math. Oper. Res..
[62] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[63] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..
[64] Konstantinos V. Katsikopoulos,et al. Markov decision processes with delays and asynchronous cost collection , 2003, IEEE Trans. Autom. Control..
[65] E. Ordentlich,et al. On delayed prediction of individual sequences , 2002, Proceedings IEEE International Symposium on Information Theory,.
[66] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[67] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.