Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time

We give a polynomial-time algorithm for learning high-dimensional halfspaces with margins in $d$-dimensional space to within desired TV distance when the ambient distribution is an unknown affine transformation of the $d$-fold product of an (unknown) symmetric one-dimensional logconcave distribution, and the halfspace is introduced by deleting at least an $\epsilon$ fraction of the data in one of the component distributions. Notably, our algorithm does not need labels and establishes the unique (and efficient) identifiability of the hidden halfspace under this distributional assumption. The sample and time complexity of the algorithm are polynomial in the dimension and $1/\epsilon$. The algorithm uses only the first two moments of suitable re-weightings of the empirical distribution, which we call contrastive moments; its analysis uses classical facts about generalized Dirichlet polynomials and relies crucially on a new monotonicity property of the moment ratio of truncations of logconcave distributions. Such algorithms, based only on first and second moments were suggested in earlier work, but hitherto eluded rigorous guarantees. Prior work addressed the special case when the underlying distribution is Gaussian via Non-Gaussian Component Analysis. We improve on this by providing polytime guarantees based on Total Variation (TV) distance, in place of existing moment-bound guarantees that can be super-polynomial. Our work is also the first to go beyond Gaussians in this setting.

[1]  S. Vempala,et al.  Beyond Moments: Robustly Learning Affine Transformations with Asymptotically Optimal Error , 2023, 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS).

[2]  Yan Shuo Tan,et al.  Polynomial Time and Sample Complexity for Non-Gaussian Component Analysis: Spectral Methods , 2017, COLT.

[3]  Santosh S. Vempala,et al.  Fourier PCA and robust tensor decomposition , 2013, STOC.

[4]  Rocco A. Servedio,et al.  Algorithms and hardness results for parallel large margin learning , 2011, J. Mach. Learn. Res..

[5]  Santosh S. Vempala,et al.  Structure from Local Optima: Learning Subspace Juntas via Higher Order PCA , 2011, 1108.3329.

[6]  R. Vershynin,et al.  Covariance estimation for distributions with 2+ε moments , 2011, 1106.2775.

[7]  Santosh S. Vempala,et al.  Learning Convex Concepts from Gaussian Distributions with PCA , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[8]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[9]  Santosh S. Vempala,et al.  Isotropic PCA and Affine-Invariant Clustering , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[10]  Santosh S. Vempala,et al.  The geometry of logconcave functions and sampling algorithms , 2007, Random Struct. Algorithms.

[11]  Motoaki Kawanabe,et al.  In Search of Non-Gaussian Components of a High-Dimensional Distribution , 2006, J. Mach. Learn. Res..

[12]  Maria-Florina Balcan,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[13]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[14]  Tong Zhang An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[15]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[16]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[17]  Santosh S. Vempala,et al.  An algorithmic theory of learning: Robust concepts and random projection , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[18]  Leslie G. Valiant,et al.  Projection Learning , 1998, COLT' 98.

[19]  Jean-François Cardoso,et al.  Multidimensional independent component analysis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[20]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[21]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[22]  Yuandong Tian Deep Contrastive Learning is Provably (almost) Principal Component Analysis , 2022, ArXiv.

[23]  Avrim Blum,et al.  Relevant Examples and Relevant Features: Thoughts from Computational Learning Theory , 1994 .

[24]  Signal Processing , 1991 .

[25]  Avrim Blum,et al.  Learning boolean functions in an infinite attribute space , 1990, STOC '90.

[26]  M. Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .