暂无分享,去创建一个
Haipeng Luo | Chen-Yu Wei | Chung-Wei Lee | Mengxiao Zhang | Haipeng Luo | Chen-Yu Wei | Mengxiao Zhang | Chung-wei Lee
[1] Haipeng Luo,et al. Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition , 2020, ICML.
[2] Csaba Szepesvari,et al. Bandit Algorithms , 2020 .
[3] Haipeng Luo,et al. A Closer Look at Small-loss Bounds for Bandits with Graph Feedback , 2020, COLT.
[4] Yishay Mansour,et al. Online Convex Optimization in Adversarial Markov Decision Processes , 2019, ICML.
[5] Haipeng Luo,et al. Improved Path-length Regret Bounds for Bandits , 2019, COLT.
[6] Jacob D. Abernethy,et al. Online Learning via the Differential Privacy Lens , 2017, NeurIPS.
[7] Yishay Mansour,et al. Online Stochastic Shortest Path with Bandit Feedback and Unknown Transition Function , 2019, NeurIPS.
[8] Haipeng Luo,et al. Efficient Online Portfolio with Logarithmic Regret , 2018, NeurIPS.
[9] Yuanzhi Li,et al. Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits , 2018, ICML.
[10] Haipeng Luo,et al. More Adaptive Algorithms for Adversarial Bandits , 2018, COLT.
[11] Yuanzhi Li,et al. Sparsity, variance and curvature in multi-armed bandits , 2017, ALT.
[12] Éva Tardos,et al. Small-loss bounds for online learning with partial information , 2017, COLT.
[13] Haipeng Luo,et al. Corralling a Band of Bandit Algorithms , 2016, COLT.
[14] Yin Tat Lee,et al. Kernel-based methods for bandit convex optimization , 2016, STOC.
[15] Sebastian Pokutta,et al. An efficient high-probability algorithm for Linear Bandits , 2016, ArXiv.
[16] Éva Tardos,et al. Learning in Games: Robustness of Fast Convergence , 2016, NIPS.
[17] Elad Hazan,et al. Volumetric Spanners: An Efficient Exploration Basis for Learning , 2013, J. Mach. Learn. Res..
[18] Gergely Neu,et al. Explore no more: Improved high-probability regret bounds for non-stochastic bandits , 2015, NIPS.
[19] Gergely Neu,et al. First-order regret bounds for combinatorial semi-bandits , 2015, COLT.
[20] Gergely Neu,et al. Online learning in episodic Markovian decision processes by relative entropy policy search , 2013, NIPS.
[21] Karthik Sridharan,et al. Online Learning with Predictable Sequences , 2012, COLT.
[22] Elad Hazan,et al. Interior-Point Methods for Full-Information and Bandit Online Learning , 2012, IEEE Transactions on Information Theory.
[23] Sham M. Kakade,et al. Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.
[24] Ambuj Tewari,et al. Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback , 2011, AISTATS.
[25] Gábor Lugosi,et al. Minimax Policies for Combinatorial Prediction Games , 2011, COLT.
[26] Nicolò Cesa-Bianchi,et al. Combinatorial Bandits , 2012, COLT.
[27] Jean-Yves Audibert,et al. Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.
[28] Jacob D. Abernethy,et al. Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.
[29] Elad Hazan,et al. Better Algorithms for Benign Bandits , 2009, J. Mach. Learn. Res..
[30] Thomas P. Hayes,et al. High-Probability Regret Bounds for Bandit Online Linear Optimization , 2008, COLT.
[31] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.
[32] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.
[33] Tamás Linder,et al. The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..
[34] Peter Auer,et al. Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring , 2006, ALT.
[35] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[36] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[37] Yurii Nesterov,et al. Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.
[38] D. Freedman. On Tail Probabilities for Martingales , 1975 .