Learning Geometric Concepts via Gaussian Surface Area

We study the learnability of sets in Ropf<sup>n</sup> under the Gaussian distribution, taking Gaussian surface area as the "complexity measure" of the sets being learned. Let C<sub>S</sub> denote the class of all (measurable) sets with surface area at most S. We first show that the class C<sub>S</sub> is learnable to any constant accuracy in time n<sup>O(S</sup> <sup>2</sup> <sup>)</sup>, even in the arbitrary noise ("agnostic'') model. Complementing this, we also show that any learning algorithm for C<sub>S</sub> information-theoretically requires 2<sup>Omega(S</sup> <sup>2</sup> <sup>)</sup> examples for learning to constant accuracy. These results together show that Gaussian surface area essentially characterizes the computational complexity of learning under the Gaussian distribution. Our approach yields several new learning results, including the following (all bounds are for learning to any constant accuracy): The class of all convex sets can be agnostically learned in time 2<sup>O</sup> <sup>~</sup> <sup>(radicn)</sup> (and we prove a 2<sup>Omega(radicn)</sup> lower bound for noise-free learning). This is the first subexponential time algorithm for learning general convex sets even in the noise-free (PAC) model. Intersections of k halfspaces can be agnostically learned in time n<sup>O(log</sup> <sup>k)</sup> (cf. Vempala's n<sup>O(k)</sup> time algorithm for learning in the noise-free model).Cones (with apex centered at the origin), and spheres witharbitrary radius and center, can be agnostically learned in time poly(n).

[1]  Philip M. Long On the sample complexity of PAC learning half-spaces against the uniform distribution , 1995, IEEE Trans. Neural Networks.

[2]  Santosh S. Vempala,et al.  The Random Projection Method , 2005, DIMACS Series in Discrete Mathematics and Theoretical Computer Science.

[3]  H. Balsters,et al.  Learnability with respect to fixed distributions , 1991 .

[4]  Sergey G. Bobkov,et al.  On Gaussian and Bernoulli covariance representations , 2001 .

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  Gilles Zémor,et al.  Discrete Isoperimetric Inequalities and the Probability of a Decoding Error , 2000, Combinatorics, Probability and Computing.

[7]  Eric B. Baum,et al.  The Perceptron Algorithm is Fast for Nonmalicious Distributions , 1990, Neural Computation.

[8]  Neil D. Lawrence,et al.  Semi-supervised Learning via Gaussian Processes , 2004, NIPS.

[9]  Keith Ball The reverse isoperimetric problem for Gaussian measure , 1993, Discret. Comput. Geom..

[10]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[11]  F. Nazarov On the Maximal Perimeter of a Convex Set in $ ℝ n $$\mathbb{R}^n$ with Respect to a Gaussian Measure , 2003 .

[12]  P. Patnaik The Non-central X^2- and F- distribution and Their Applications , 1949 .

[13]  Y. Peres Noise Stability of Weighted Majority , 2004, math/0412377.

[14]  M. Talagrand Isoperimetry, logarithmic sobolev inequalities on the discrete cube, and margulis' graph connectivity theorem , 1993 .

[15]  V. Bentkus On the dependence of the Berry–Esseen bound on dimension , 2003 .

[16]  D. Bakry L'hypercontractivité et son utilisation en théorie des semigroupes , 1994 .

[17]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[18]  S. Bobkov An isoperimetric inequality on the discrete cube, and an elementary proof of the isoperimetric inequality in Gauss space , 1997 .

[19]  Philip M. Long Halfspace Learning, Linear Programming, and Nonmalicious Distributions , 1994, Inf. Process. Lett..

[20]  Shai Ben-David,et al.  On the difficulty of approximately maximizing agreements , 2000, J. Comput. Syst. Sci..

[21]  Alon Itai,et al.  Learnability by fixed distributions , 1988, COLT '88.

[22]  Alexander A. Sherstov,et al.  Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[23]  Santosh S. Vempala,et al.  A random sampling based algorithm for learning the intersection of half-spaces , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[24]  I. Benjamini,et al.  Noise sensitivity of Boolean functions and applications to percolation , 1998, math/9811157.

[25]  V. Sudakov,et al.  Extremal properties of half-spaces for spherically invariant measures , 1978 .

[26]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[27]  Nader H. Bshouty,et al.  Maximizing Agreements with One-Sided Error with Applications to Heuristic Learning , 2005, Machine Learning.

[28]  C. Borell The Brunn-Minkowski inequality in Gauss space , 1975 .

[29]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[30]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[31]  S. Bobkov,et al.  Discrete isoperimetric and Poincaré-type inequalities , 1999 .

[32]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[33]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[34]  Avrim Blum,et al.  Learning an Intersection of a Constant Number of Halfspaces over a Uniform Distribution , 1997, J. Comput. Syst. Sci..

[35]  J. Lindenstrauss,et al.  Geometric Aspects of Functional Analysis , 1987 .

[36]  ERIC B. BAUM,et al.  On learning a union of half spaces , 1990, J. Complex..

[37]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[38]  S. Janson Gaussian Hilbert Spaces , 1997 .

[39]  Rocco A. Servedio,et al.  Learning intersections and thresholds of halfspaces , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[40]  P. Patnaik THE NON-CENTRAL χ2- AND F-DISTRIBUTIONS AND THEIR APPLICATIONS , 1949 .

[41]  Nader H. Bshouty,et al.  On the Fourier spectrum of monotone functions , 1996, JACM.

[42]  M. Ledoux Semigroup proofs of the isoperimetric inequality in Euclidean and Gauss space , 1994 .

[43]  V. Rich Personal communication , 1989, Nature.

[44]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[45]  Rocco A. Servedio,et al.  Learning intersections of halfspaces with a margin , 2004, J. Comput. Syst. Sci..

[46]  L. Gross LOGARITHMIC SOBOLEV INEQUALITIES. , 1975 .

[47]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[48]  Michel Talagrand,et al.  How much are increasing sets positively correlated? , 1996, Comb..

[49]  Christian Houdré,et al.  Some Connections Between Isoperimetric and Sobolev-Type Inequalities , 1997 .

[50]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1951 .

[51]  G. Pisier Probabilistic methods in the geometry of Banach spaces , 1986 .

[52]  Stephen Kwek,et al.  PAC Learning Intersections of Halfspaces with Membership Queries , 1998, Algorithmica.