Universal ε-approximators for integrals

Let <i>X</i> be a space and <i>F</i> a family of 0, 1-valued functions on <i>X</i>. Vapnik and Chervonenkis showed that if <i>F</i> is "simple" (finite VC dimension), then for every probability measure μ on <i>X</i> and ε > 0 there is a finite set <i>S</i> such that for all <i>f</i> ε <i>F</i>, Σ<sub><i>x</i>ε<i>s</i></sub> <i>f(x</i>)/|<i>S</i>| = |<i>f f(x)d</i>μ(<i>x</i>)] ± ε. Think of <i>S</i> as a "universal ε-approximator" for integration in <i>F. S</i> can actually be obtained w.h.p. just by sampling a few points from μ. This is a mainstay of computational learning theory. It was later extended by other authors to families of bounded (e.g., [0, 1]-valued) real functions. In this work we establish similar "universal ε-approximators" for families of unbounded nonnegative real functions --- in particular, for the families over which one optimizes when performing data classification. (In this case the ε-approximation should be multiplicative.) Specifically, let <i>F</i> be the family of "<i>k</i>-median functions" (or <i>k</i>-means, etc.) on <i>R</i><sup><i>d</i></sup> with an arbitrary norm ϱ. That is, any set <i>u</i><sub>1</sub>,..., <i>u</i><sub><i>k</i></sub> ε <i>R</i><sup><i>d</i></sup> determines an <i>f</i> by <i>f(x</i>) = (min<sub><i>i</i></sub> ϱ(<i>x</i> - <i>u</i><sub><i>i</i></sub>))<sup>α</sup>. (Here α ≥ 0.) Then for every measure μ on <i>R</i><sup><i>d</i></sup> there exists a set <i>S</i> of cardinality poly(<i>k, d</i>, 1/ε) and a measure <i>v</i> supported on <i>S</i> such that for every <i>f</i> ε <i>F</i>, Σ<sub><i>x</i>ε<i>s</i></sub> <i>f(x)v(x)</i> ε (1 ± ε) · (<i>f f (x)d</i>μ(<i>x</i>)).

[1]  D. Hilbert Über die Darstellung definiter Formen als Summe von Formenquadraten , 1888 .

[2]  de Ng Dick Bruijn A combinatorial problem , 1946 .

[3]  J. Miller Numerical Analysis , 1966, Nature.

[4]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[5]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[6]  S. Shelah A combinatorial problem; stability and order for models and theories in infinitary languages. , 1972 .

[7]  R. Dudley Metric Entropy of Some Classes of Sets with Differentiable Boundaries , 1974 .

[8]  Branko Grünbaum,et al.  Venn Diagrams and Independent Families of Sets. , 1975 .

[9]  D. Pollard Convergence of stochastic processes , 1984 .

[10]  David Haussler,et al.  Epsilon-nets and simplex range queries , 1986, SCG '86.

[11]  David Haussler,et al.  ɛ-nets and simplex range queries , 1987, Discret. Comput. Geom..

[12]  Vladimir Vapnik,et al.  Inductive principles of the search for empirical dependences (methods based on weak convergence of probability measures) , 1989, COLT '89.

[13]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[14]  R. Dudley,et al.  Uniform and universal Glivenko-Cantelli classes , 1991 .

[15]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[16]  Daniel Q. Naiman,et al.  Independent collections of translates of boxes and a conjecture due to Grünbaum , 1993, Discret. Comput. Geom..

[17]  Robert E. Schapire,et al.  Efficient Distribution-Free Learning of Probabilistic , 1994 .

[18]  Philip M. Long,et al.  Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..

[19]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.

[20]  Joseph O'Rourke,et al.  Handbook of Discrete and Computational Geometry, Second Edition , 1997 .

[21]  Gert Vegter,et al.  In handbook of discrete and computational geometry , 1997 .

[22]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[23]  Claire Mathieu,et al.  A Randomized Approximation Scheme for Metric MAX-CUT , 1998, FOCS.

[24]  Jon M. Kleinberg,et al.  Segmentation problems , 2004, JACM.

[25]  Philip M. Long,et al.  Prediction, Learning, Uniform Convergence, and Scale-Sensitive Dimensions , 1998, J. Comput. Syst. Sci..

[26]  Bernard Chazelle,et al.  The Discrepancy Method , 1998, ISAAC.

[27]  Noga Alon,et al.  On Two Segmentation Problems , 1999, J. Algorithms.

[28]  Leonard J. Schulman,et al.  Clustering for Edge-Cost Minimization , 1999, Electron. Colloquium Comput. Complex..

[29]  Jiri Matousek,et al.  Lectures on discrete geometry , 2002, Graduate texts in mathematics.

[30]  Michelle Effros,et al.  Deterministic clustering with data nets , 2004, Electron. Colloquium Comput. Complex..

[31]  Sariel Har-Peled,et al.  Coresets for $k$-Means and $k$-Median Clustering and their Applications , 2018, STOC 2004.

[32]  Sariel Har-Peled,et al.  Smaller Coresets for k-Median and k-Means Clustering , 2005, SCG.

[33]  B. K. Natarajan On Learning Sets and Functions , 1989, Machine Learning.

[34]  Sariel Har-Peled,et al.  Coresets for Discrete Integration and Clustering , 2006, FSTTCS.

[35]  Michael Langberg,et al.  Contraction and Expansion of Convex Sets , 2007, CCCG.

[36]  Kasturi R. Varadarajan,et al.  Geometric Approximation via Coresets , 2007 .

[37]  Dan Feldman,et al.  A PTAS for k-means clustering based on weak coresets , 2007, SCG '07.

[38]  Ke Chen,et al.  On Coresets for k-Median and k-Means Clustering in Metric and Euclidean Spaces and Their Applications , 2009, SIAM J. Comput..

[39]  Márton Naszódi,et al.  On the transversal number and VC-dimension of families of positive homothets of a convex body , 2009, Discret. Math..