Learning to Act Greedily: Polymatroid Semi-Bandits

Many important optimization problems, such as the minimum spanning tree and minimum-cost flow, can be solved optimally by a greedy method. In this work, we study a learning variant of these problems, where the model of the problem is unknown and has to be learned by interacting repeatedly with the environment in the bandit setting. We formalize our learning problem quite generally, as learning how to maximize an unknown modular function on a known polymatroid. We propose a computationally efficient algorithm for solving our problem and bound its expected cumulative regret. Our gap-dependent upper bound is tight up to a constant and our gap-free upper bound is tight up to polylogarithmic factors. Finally, we evaluate our method on three problems and demonstrate that it is practical.

[1]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[2]  Zheng Wen,et al.  Large-Scale Optimistic Adaptive Submodularity , 2014, AAAI.

[3]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[4]  H. Whitney On the Abstract Properties of Linear Dependence , 1935 .

[5]  Wei Chen,et al.  Combinatorial multi-armed bandit: general framework, results and applications , 2013, ICML 2013.

[6]  Shipra Agrawal,et al.  Analysis of Thompson Sampling for the Multi-armed Bandit Problem , 2011, COLT.

[7]  Gergely Neu,et al.  An Efficient Algorithm for Learning with Semi-bandit Feedback , 2013, ALT.

[8]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[9]  Branislav Kveton,et al.  Efficient Learning in Large-Scale Combinatorial Semi-Bandits , 2014, ICML.

[10]  Konstantina Papagiannaki,et al.  Analysis of point-to-point packet delay in an operational network , 2004, IEEE INFOCOM 2004.

[11]  Ratul Mahajan,et al.  Measuring ISP topologies with rocketfuel , 2002, TNET.

[12]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[13]  Panos M. Pardalos,et al.  A survey of combinatorial optimization problems in multicast routing , 2005, Comput. Oper. Res..

[14]  Zheng Wen,et al.  Matroid Bandits: Fast Combinatorial Optimization with Learning , 2014, UAI.

[15]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[16]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[17]  Jeff A. Bilmes,et al.  Online Submodular Set Cover, Ranking, and Repeated Active Learning , 2011, NIPS.

[18]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[19]  Zheng Wen,et al.  Sequential Bayesian Search , 2013, ICML.

[20]  Wei Chen,et al.  Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.

[21]  William J. Cook,et al.  Combinatorial optimization , 1997 .

[22]  Jean-Yves Audibert,et al.  Minimax Policies for Adversarial and Stochastic Bandits. , 2009, COLT 2009.

[23]  Zheng Wen,et al.  Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2014, AISTATS.

[24]  Zheng Wen,et al.  Adaptive Submodular Maximization in Bandit Setting , 2013, NIPS.

[25]  Bhaskar Krishnamachari,et al.  Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[26]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[27]  Gábor Lugosi,et al.  Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..

[28]  Yisong Yue,et al.  Linear Submodular Bandits and their Application to Diversified Retrieval , 2011, NIPS.

[29]  Shuji Kijima,et al.  Online Prediction under Submodular Constraints , 2012, ALT.

[30]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[31]  Nimrod Megiddo,et al.  Optimal flows in networks with multiple sources and sinks , 1974, Math. Program..

[32]  Zheng Wen,et al.  Diversified Utility Maximization for Recommendations , 2014, RecSys Posters.

[33]  Zheng Wen,et al.  DUM: Diversity-Weighted Utility Maximization for Recommendations , 2014, ArXiv.