Learning and 1-bit Compressed Sensing under Asymmetric Noise

We study the approximate recovery problem: Given corrupted 1-bit measurements of the form sign(w^*⋅x_i), recover a vector w that is a good approximation to w^* ∈ R^d. This problem has been studied by both the learning theory and signal processing communities. In learning theory, this is known as the problem of learning halfspaces with noise, and in signal processing, as 1-bit compressed sensing, in which there is an additional assumption that w^* is t-sparse. The challenge in both cases is to design computationally efficient algorithms that are tolerant to large amounts of noise under realistic noise models. Furthermore, in the case of 1-bit compressed sensing, we require the number of measurements x_i to scale polynomially in t and only polylogarithmically in d, the ambient dimension. In this work, we introduce algorithms with nearly optimal guarantees for both problems under two realistic noise models, bounded (Massart) noise and adversarial (agnostic) noise, when the measurements x_i’s are drawn from any isotropic log-concave distribution. In bounded (Massart) noise, an adversary can flip the measurement of each point x with probability η(x)≤η

[1]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[2]  Maria-Florina Balcan,et al.  Active and passive learning of linear separators under log-concave distributions , 2012, COLT.

[3]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  Ohad Shamir,et al.  Learning Kernel-Based Halfspaces with the Zero-One Loss , 2010, COLT 2010.

[6]  Santosh S. Vempala,et al.  A simple polynomial-time rescaling algorithm for solving linear programs , 2004, STOC '04.

[7]  Prateek Jain,et al.  One-Bit Compressed Sensing: Provable Support and Vector Recovery , 2013, ICML.

[8]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[9]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[10]  Yaniv Plan,et al.  Robust 1-bit Compressed Sensing and Sparse Logistic Regression: A Convex Programming Approach , 2012, IEEE Transactions on Information Theory.

[11]  W. Lockau,et al.  Contents , 2015 .

[12]  Amit Daniely,et al.  Complexity theoretic limitations on learning halfspaces , 2015, STOC.

[13]  Ryan O'Donnell,et al.  Learning juntas , 2003, STOC '03.

[14]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[15]  R.L. Rivest,et al.  A Formal Model of Hierarchical Concept Learning , 1994, Inf. Comput..

[16]  Rocco A. Servedio,et al.  Attribute-Efficient Learning and Weight-Degree Tradeoffs for Polynomial Threshold Functions , 2012, COLT.

[17]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[18]  Santosh S. Vempala,et al.  The geometry of logconcave functions and sampling algorithms , 2007, Random Struct. Algorithms.

[19]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[20]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, FOCS.

[21]  Avrim Blum,et al.  Learning boolean functions in an infinite attribute space , 1990, STOC '90.

[22]  Rocco A. Servedio,et al.  Toward Attribute Efficient Learning of Decision Lists and Parities , 2006, J. Mach. Learn. Res..

[23]  Richard G. Baraniuk,et al.  1-Bit compressive sensing , 2008, 2008 42nd Annual Conference on Information Sciences and Systems.

[24]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[25]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[26]  Maria-Florina Balcan,et al.  Efficient Learning of Linear Separators under Bounded Noise , 2015, COLT.

[27]  Rocco A. Servedio,et al.  Efficient algorithms in computational learning theory , 2001 .

[28]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[29]  Yaniv Plan,et al.  One‐Bit Compressed Sensing by Linear Programming , 2011, ArXiv.

[30]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[31]  Amit Daniely A PTAS for Agnostically Learning Halfspaces , 2015, COLT.

[32]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[33]  Tong Zhang,et al.  Covering Number Bounds of Certain Regularized Linear Function Classes , 2002, J. Mach. Learn. Res..

[34]  Adam Tauman Kalai,et al.  On agnostic boosting and parity learning , 2008, STOC.

[35]  Chao Zhang,et al.  A Counterexample for the Validity of Using Nuclear Norm as a Convex Surrogate of Rank , 2013, ECML/PKDD.

[36]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.

[37]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[38]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[39]  Edward Y. Chang,et al.  Exact Recoverability of Robust PCA via Outlier Pursuit with Tight Recovery Bounds , 2015, AAAI.

[40]  Robert H. Sloan,et al.  Pac Learning, Noise, and Geometry , 1996 .

[41]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[42]  Pravesh Kothari,et al.  Embedding Hard Learning Problems into Gaussian Space , 2014, Electron. Colloquium Comput. Complex..

[43]  Claudio Gentile,et al.  Selective sampling and active learning from single and multiple teachers , 2012, J. Mach. Learn. Res..

[44]  Rocco A. Servedio,et al.  Attribute-efficient learning of decision lists and linear threshold functions under unconcentrated distributions , 2006, NIPS.

[45]  Laurent Jacques,et al.  Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors , 2011, IEEE Transactions on Information Theory.

[46]  Jinfeng Yi,et al.  Efficient Algorithms for Robust One-bit Compressive Sensing , 2014, ICML.

[47]  Uriel Feige,et al.  Learning and inference in the presence of corrupted inputs , 2015, COLT.

[48]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[49]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[50]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[51]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[52]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.