Embedding Hard Learning Problems into Gaussian Space

We give the first representation-independent hardness result for agnostically learning halfspaces with respect to the Gaussian distribution. We reduce from the problem of learning sparse parities with noise with respect to the uniform distribution on the hypercube (sparse LPN), a notoriously hard problem in theoretical computer science and show that any algorithm for agnostically learning halfspaces requires n (log (1 / )) time under the assumption that k-sparse LPN requires n ( k) time, ruling out a polynomial time algorithm for the problem. As far as we are aware, this is the first representation-independent hardness result for supervised learning when the underlying distribution is restricted to be a Gaussian. We also show that the problem of agnostically learning sparse polynomials with respect to the Gaussian distribution in polynomial time is as hard as PAC learning DNFs on the uniform distribution in polynomial time. This complements the surprising result of Andoni et. al. [1] who show that sparse polynomials are learnable under random Gaussian noise in polynomial time. Taken together, these results show the inherent diculty of designing supervised learning algorithms in Euclidean space even in the presence of strong distributional assumptions. Our results use a novel embedding of random labeled examples from the uniform distribution on the Boolean hypercube into random labeled examples from the Gaussian distribution that allows us to relate the hardness of learning problems on two dierent domains and distributions. 1998 ACM Subject Classification F.2.0. Analysis of Algorithms and Problem Complexity

[1]  Daniel M. Kane,et al.  Bounded Independence Fools Degree-2 Threshold Functions , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[2]  R. Schapire,et al.  Toward efficient agnostic learning , 2004, Machine Learning.

[3]  Pravesh Kothari,et al.  Representation, Approximation and Learning of Submodular Functions Using Low-rank Decision Trees , 2013, COLT.

[4]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[5]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[6]  Gregory Valiant,et al.  Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[7]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[8]  Vitaly Feldman,et al.  On Using Extended Statistical Queries to Avoid Membership Queries , 2001, J. Mach. Learn. Res..

[9]  Jeffrey C. Jackson An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution , 1997, J. Comput. Syst. Sci..

[10]  Vitaly Feldman A Complete Characterization of Statistical Query Learning with Applications to Evolvability , 2009, FOCS.

[11]  Yoav Freund,et al.  An improved boosting algorithm and its implications on learning complexity , 1992, COLT '92.

[12]  Eyal Kushilevitz,et al.  Learning decision trees using the Fourier spectrum , 1991, STOC '91.

[13]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[14]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[15]  Alexandr Andoni,et al.  Learning Sparse Polynomial Functions , 2014, SODA.

[16]  Linda Sellie,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[17]  Rocco A. Servedio,et al.  Hardness results for agnostically learning low-degree polynomial threshold functions , 2011, SODA '11.

[18]  Ohad Shamir,et al.  Learning Kernel-Based Halfspaces with the 0-1 Loss , 2011, SIAM J. Comput..

[19]  Hans Ulrich Simon,et al.  Efficient Learning of Linear Perceptrons , 2000, NIPS.

[20]  Pravesh Kothari,et al.  Constructing Hard Functions from Learning Algorithms , 2013, Electron. Colloquium Comput. Complex..

[21]  Alexander A. Sherstov,et al.  Lower Bounds for Agnostic Learning via Approximate Rank , 2010, computational complexity.

[22]  Ryan O'Donnell,et al.  Learning Geometric Concepts via Gaussian Surface Area , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[23]  Varun Kanade,et al.  Computational Bounds on Statistical Query Learning , 2012, COLT.

[24]  Rocco A. Servedio,et al.  Learning intersections and thresholds of halfspaces , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[25]  Shai Shalev-Shwartz,et al.  Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs , 2012, NIPS.

[26]  Vitaly Feldman,et al.  On Agnostic Learning of Parities, Monomials, and Halfspaces , 2009, SIAM J. Comput..

[27]  Alexander A. Sherstov,et al.  Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[28]  Pravesh Kothari,et al.  Constructing Hard Functions Using Learning Algorithms , 2013, 2013 IEEE Conference on Computational Complexity.

[29]  Rocco A. Servedio,et al.  Boosting and Hard-Core Set Construction , 2003, Machine Learning.

[30]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[31]  Nathan Linial,et al.  From average case complexity to improper learning complexity , 2013, STOC.

[32]  David Cash,et al.  Efficient Authentication from Hard Learning Problems , 2011, Journal of Cryptology.

[33]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[34]  Jeffrey C. Jackson,et al.  An efficient membership-query algorithm for learning DNF with respect to the uniform distribution , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[35]  Ohad Shamir,et al.  Learning Kernel-Based Halfspaces with the Zero-One Loss , 2010, COLT 2010.

[36]  David Cash,et al.  Efficient Authentication from Hard Learning Problems , 2011, EUROCRYPT.

[37]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[38]  Daniel M. Kane,et al.  Learning Halfspaces Under Log-Concave Densities: Polynomial Approximations and Moment Matching , 2013, COLT.

[39]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[40]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .