暂无分享,去创建一个
Éva Tardos | Karthik Sridharan | Thodoris Lykouris | Karthik Sridharan | É. Tardos | Thodoris Lykouris
[1] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[2] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .
[3] Gergely Neu,et al. Explore no more: Improved high-probability regret bounds for non-stochastic bandits , 2015, NIPS.
[4] Claudio Gentile,et al. Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..
[5] Karthik Sridharan,et al. Online Learning with Predictable Sequences , 2012, COLT.
[6] Shie Mannor,et al. From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.
[7] Rémi Munos,et al. Efficient learning by implicit exploration in bandit problems with side observations , 2014, NIPS.
[8] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[9] Haipeng Luo,et al. Achieving All with No Parameters: AdaNormalHedge , 2015, COLT.
[10] Gergely Neu,et al. First-order regret bounds for combinatorial semi-bandits , 2015, COLT.
[11] Noga Alon,et al. Online Learning with Feedback Graphs: Beyond Bandits , 2015, COLT.
[12] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..
[13] Karthik Sridharan,et al. Online Non-Parametric Regression , 2014, COLT.
[14] Gergely Neu,et al. An Efficient Algorithm for Learning with Semi-bandit Feedback , 2013, ALT.
[15] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .
[16] Noga Alon,et al. From Bandits to Experts: A Tale of Domination and Independence , 2013, NIPS.
[17] Gábor Lugosi,et al. Minimizing regret with label efficient prediction , 2004, IEEE Transactions on Information Theory.
[18] J. Langford,et al. The Epoch-Greedy algorithm for contextual multi-armed bandits , 2007, NIPS 2007.
[19] Avrim Blum,et al. Routing without regret: on convergence to nash equilibria of regret-minimizing algorithms in routing games , 2006, PODC '06.
[20] Yuanzhi Li,et al. Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits , 2018, ICML.
[21] Mohammad Taghi Hajiaghayi,et al. Regret minimization and the price of total anarchy , 2008, STOC.
[22] Ambuj Tewari,et al. Online learning via sequential complexities , 2010, J. Mach. Learn. Res..
[23] Tim Roughgarden,et al. Minimizing Regret with Multiple Reserves , 2016, EC.
[24] Seshadhri Comandur,et al. Efficient learning algorithms for changing environments , 2009, ICML '09.
[25] Avrim Blum,et al. Near-optimal online auctions , 2005, SODA '05.
[26] Karthik Sridharan,et al. On Equivalence of Martingale Tail Bounds and Deterministic Regret Inequalities , 2015, COLT.
[27] Amit Daniely,et al. Strongly Adaptive Online Learning , 2015, ICML.
[28] Mark Herbster,et al. Tracking the Best Expert , 1995, Machine Learning.
[29] John Langford,et al. Contextual Bandit Algorithms with Supervised Learning Guarantees , 2010, AISTATS.
[30] Ambuj Tewari,et al. Online Learning: Random Averages, Combinatorial Parameters, and Learnability , 2010, NIPS.
[31] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..
[32] Gábor Lugosi,et al. Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..
[33] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.
[34] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[35] Karthik Sridharan,et al. BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits , 2016, ICML.
[36] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..
[37] Noga Alon,et al. Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback , 2014, SIAM J. Comput..
[38] Michal Valko,et al. Online Learning with Noisy Side Observations , 2016, AISTATS.
[39] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..
[40] Christos Dimitrakakis,et al. Thompson Sampling for Stochastic Bandits with Graph Feedback , 2017, AAAI.
[41] Éva Tardos,et al. Learning and Efficiency in Games with Dynamic Population , 2015, SODA.
[42] Peter Auer,et al. Hannan Consistency in On-Line Learning in Case of Unbounded Losses Under Partial Monitoring , 2006, ALT.
[43] Claudio Gentile,et al. Ieee Transactions on Information Theory 1 Regret Minimization for Reserve Prices in Second-price Auctions , 2022 .
[44] T. Cover. Universal Portfolios , 1996 .
[45] Akshay Krishnamurthy,et al. Efficient Algorithms for Adversarial Contextual Learning , 2016, ICML.
[46] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[47] John Langford,et al. Open Problem: First-Order Regret Bounds for Contextual Bandits , 2017, COLT.
[48] Éva Tardos,et al. Learning in Games: Robustness of Fast Convergence , 2016, NIPS.
[49] Gilles Stoltz. Incomplete information and internal regret in prediction of individual sequences , 2005 .
[50] Tamir Hazan,et al. Online Learning with Feedback Graphs Without the Graphs , 2016, ICML 2016.