Online Decision-Making in General Combinatorial Spaces

We study online combinatorial decision problems, where one must make sequential decisions in some combinatorial space without knowing in advance the cost of decisions on each trial; the goal is to minimize the total regret over some sequence of trials relative to the best fixed decision in hindsight. Such problems have been studied mostly in settings where decisions are represented by Boolean vectors and costs are linear in this representation. Here we study a general setting where costs may be linear in any suitable low-dimensional vector representation of elements of the decision space. We give a general algorithm for such problems that we call low-dimensional online mirror descent (LDOMD); the algorithm generalizes both the Component Hedge algorithm of Koolen et al. (2010), and a recent algorithm of Suehiro et al. (2012). Our study offers a unification and generalization of previous work, and emphasizes the role of the convex polytope arising from the vector representation of the decision space; while Boolean representations lead to 0-1 polytopes, more general vector representations lead to more general polytopes. We study several examples of both types of polytopes. Finally, we demonstrate the benefit of having a general framework for such problems via an application to an online transportation problem; the associated transportation polytopes generalize the Birkhoff polytope of doubly stochastic matrices, and the resulting algorithm generalizes the PermELearn algorithm of Helmbold and Warmuth (2009).

[1]  Francis R. Bach,et al.  Learning with Submodular Functions: A Convex Optimization Perspective , 2011, Found. Trends Mach. Learn..

[2]  Shuji Kijima,et al.  Online Prediction under Submodular Constraints , 2012, ALT.

[3]  Sébastien Bubeck,et al.  Introduction to Online Optimization , 2011 .

[4]  Tong Zhang,et al.  Statistical Analysis of Bayes Optimal Subset Ranking , 2008, IEEE Transactions on Information Theory.

[5]  Manfred K. Warmuth,et al.  Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension , 2008 .

[6]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[7]  G. Ziegler Lectures on Polytopes , 1994 .

[8]  Martin Grötschel,et al.  Facets of the linear ordering polytope , 1985, Math. Program..

[9]  Nir Ailon,et al.  Bandit Online Optimization over the Permutahedron , 2014, ALT.

[10]  Gábor Lugosi,et al.  Regret in Online Combinatorial Optimization , 2012, Math. Oper. Res..

[11]  S. Halfin Arbitrarily Complex Corner Polyhedra are Dense in $R^n $ , 1972 .

[12]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[13]  Manfred K. Warmuth,et al.  Learning Permutations with Exponential Weights , 2007, COLT.

[14]  R. Brualdi Combinatorial Matrix Classes , 2006 .

[15]  Manfred K. Warmuth,et al.  Path Kernels and Multiplicative Updates , 2002, J. Mach. Learn. Res..

[16]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[17]  Masayuki Takeda,et al.  Online Linear Optimization over Permutations , 2011, ISAAC.

[18]  Masayuki Takeda,et al.  Online Rank Aggregation , 2012, ACML.

[19]  Nir Ailon,et al.  Online Ranking: Discrete Choice, Spearman Correlation and Other Feedback , 2013, ArXiv.

[20]  Jun Zhang Binary choice, subset choice, random utility, and ranking: A unified perspective using the permutahedron , 2004 .