Near-Optimal Herding

Herding is an algorithm of recent interest in the machine learning community, motivated by inference in Markov random fields. It solves the following sampling problem: given a setX R d with mean , construct an infinite sequence of points fromX such that, for every t 1, the mean of the first t points in that sequence lies within Euclidean distance O(1=t) of . The classic Perceptron boundedness theorem implies that such a result actually holds for a wide class of algorithms, although the factors suppressed by the O(1=t) notation are exponential in d. Thus, to establish a non-trivial result for the sampling problem, one must carefully analyze the factors suppressed by theO(1=t) error bound. This paper studies the best error that can be achieved for the sampling problem. Known analysis of the Herding algorithm give an error bound that depends on geometric properties ofX but, even under favorable conditions, this bound depends linearly on d. We present a new polynomialtime algorithm that solves the sampling problem with error O p d log 2:5 jXj=t assuming thatX is finite. Our algorithm is based on recent algorithmic results in discrepancy theory. We also show that any algorithm for the sampling problem must have error ( p d=t). This implies that our

[1]  Wojciech Banaszczyk,et al.  Balancing vectors and Gaussian measures of n-dimensional convex bodies , 1998, Random Struct. Algorithms.

[2]  David Duvenaud,et al.  Optimally-Weighted Herding is Bayesian Quadrature , 2012, UAI.

[3]  Edoardo Amaldi,et al.  Boundedness Theorems for the Relaxation Method , 2005, Math. Oper. Res..

[4]  J. Propp,et al.  Rotor Walks and Markov Chains , 2009, 0904.4507.

[5]  S. Levin,et al.  On the boundedness of an iterative procedure for solving a system of linear inequalities , 1970 .

[6]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[7]  Thomas Rothvoß,et al.  Constructive Discrepancy Minimization for Convex Sets , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[8]  Bernard Chazelle,et al.  The discrepancy method - randomness and complexity , 2000 .

[9]  I. Althöfer On sparse approximations to randomized strategies and convex combinations , 1994 .

[10]  S. Chobanyan,et al.  Convergence A.S. of Rearranged Random Series in Banach Space and Associated Inequalities , 1994 .

[11]  K. Ball An Elementary Introduction to Modern Convex Geometry , 1997 .

[12]  Francis R. Bach,et al.  On the Equivalence between Herding and Conditional Gradient Algorithms , 2012, ICML.

[13]  Wojciech Banaszczyk,et al.  On series of signed vectors and their rearrangements , 2012, Random Struct. Algorithms.

[14]  Alexander J. Smola,et al.  Super-Samples from Kernel Herding , 2010, UAI.

[15]  J. Spencer Six standard deviations suffice , 1985 .

[16]  Imre Bárány,et al.  On the Power of Linear Dependencies , 2008 .

[17]  Joel H. Spencer,et al.  Deterministic Discrepancy Minimization , 2011, Algorithmica.

[18]  E. Steinitz Bedingt konvergente Reihen und konvexe Systeme. , 1913 .

[19]  A. Owen,et al.  Consistency of Markov chain quasi-Monte Carlo on continuous state spaces , 2011, 1105.1896.

[20]  Shachar Lovett,et al.  Constructive Discrepancy Minimization by Walking on the Edges , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[21]  Nando de Freitas,et al.  Herded Gibbs Sampling , 2013, ICLR 2013.

[22]  Andrew Gelfand,et al.  On Herding and the Perceptron Cycling Theorem , 2010, NIPS.

[23]  Nikhil Bansal,et al.  Constructive Algorithms for Discrepancy Minimization , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[24]  Max Welling,et al.  Herding dynamical weights to learn , 2009, ICML '09.

[25]  Viktor BergstrÖm Zwei sätze über ebene vektorpolygone , 1931 .

[26]  J. Matousek,et al.  Geometric Discrepancy: An Illustrated Guide , 2009 .