Learning Proportionally Fair Allocations with Low Regret
暂无分享,去创建一个
[1] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[2] Wei Chen,et al. Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.
[3] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.
[4] Imre Csiszár,et al. Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.
[5] Wei Chen,et al. Combinatorial Multi-Armed Bandit with General Reward Functions , 2016, NIPS.
[6] A. Schrijver. A Course in Combinatorial Optimization , 1990 .
[7] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.
[8] Mohammad Sadegh Talebi,et al. Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs , 2018, ALT.
[9] Yashodhan Kanoria,et al. Matching while Learning , 2016, EC.
[10] Jean C. Walrand,et al. Fair end-to-end window-based congestion control , 2000, TNET.
[11] Laurent Massoulié,et al. A queueing analysis of max-min fairness, proportional fairness and balanced fairness , 2006, Queueing Syst. Theory Appl..
[12] David Tse,et al. Fundamentals of Wireless Communication , 2005 .
[13] Zheng Wen,et al. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2014, AISTATS.
[14] Alexandre Proutière,et al. Distributed Proportional Fair Load Balancing in Heterogenous Systems , 2015, SIGMETRICS.
[15] Sham M. Kakade,et al. Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..
[16] Koby Crammer,et al. Optimal Resource Allocation with Semi-Bandit Feedback , 2014, UAI.
[17] T. L. Graves,et al. Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .
[18] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.
[19] Richard Combes,et al. Stochastic Online Shortest Path Routing: The Value of Feedback , 2013, IEEE Transactions on Automatic Control.
[20] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.
[21] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .
[22] Aurélien Garivier,et al. On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..
[23] Koby Crammer,et al. Linear Multi-Resource Allocation with Semi-Bandit Feedback , 2015, NIPS.
[24] Frank Kelly,et al. Rate control for communication networks: shadow prices, proportional fairness and stability , 1998, J. Oper. Res. Soc..
[25] Aurélien Garivier,et al. Explore First, Exploit Next: The True Shape of Regret in Bandit Problems , 2016, Math. Oper. Res..
[26] F. Topsøe. Some Bounds for the Logarithmic Function , 2004 .
[27] Alexandre Proutière,et al. Combinatorial Bandits Revisited , 2015, NIPS.