S-Concave Distributions: Towards Broader Distributions for Noise-Tolerant and Sample-Efficient Learning Algorithms

We provide new results concerning noise-tolerant and sample-efficient learning algorithms under $s$-concave distributions over $\mathbb{R}^n$ for $-\frac{1}{2n+3}\le s\le 0$. The new class of $s$-concave distributions is a broad and natural generalization of log-concavity, and includes many important additional distributions, e.g., the Pareto distribution and $t$-distribution. This class has been studied in the context of efficient sampling, integration, and optimization, but much remains unknown concerning the geometry of this class of distributions and their applications in the context of learning. The challenge is that unlike the commonly used distributions in learning (uniform or more generally log-concave distributions), this broader class is not closed under the marginalization operator and many such distributions are fat-tailed. In this work, we introduce new convex geometry tools to study the properties of s-concave distributions and use these properties to provide bounds on quantities of interest to learning including the probability of disagreement between two halfspaces, disagreement outside a band, and disagreement coefficient. We use these results to significantly generalize prior results for margin-based active learning, disagreement-based active learning, and passively learning of intersections of halfspaces. Our analysis of geometric properties of s-concave distributions might be of independent interest to optimization more broadly.

[1]  Rocco A. Servedio,et al.  Efficient algorithms in computational learning theory , 2001 .

[2]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[3]  Maria-Florina Balcan,et al.  Noise-Tolerant Life-Long Matrix Completion via Adaptive Sampling , 2016, NIPS.

[4]  Philip M. Long On the sample complexity of PAC learning half-spaces against the uniform distribution , 1995, IEEE Trans. Neural Networks.

[5]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Maria-Florina Balcan,et al.  Active and passive learning of linear separators under log-concave distributions , 2012, COLT.

[8]  S. Bobkov Large deviations and isoperimetry over convex probability measures with heavy tails , 2007 .

[9]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[10]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[11]  Rocco A. Servedio,et al.  Learning Halfspaces with Malicious Noise , 2009, ICALP.

[12]  Kamalika Chaudhuri,et al.  Beyond Disagreement-Based Agnostic Active Learning , 2014, NIPS.

[13]  Eric B. Baum,et al.  A Polynomial Time Algorithm That Learns Two Hidden Unit Nets , 1990, Neural Computation.

[14]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[15]  Yichong Xu,et al.  Noise-Tolerant Interactive Learning Using Pairwise Comparisons , 2017, NIPS.

[16]  Eric Friedman,et al.  Active Learning for Smooth Problems , 2009, COLT.

[17]  Santosh S. Vempala,et al.  A simple polynomial-time rescaling algorithm for solving linear programs , 2008, Math. Program..

[18]  Santosh S. Vempala,et al.  Sampling s-Concave Functions: The Limit of Convexity Based Isoperimetry , 2009, APPROX-RANDOM.

[19]  H. Sebastian Seung,et al.  Information, Prediction, and Query by Committee , 1992, NIPS.

[20]  B. Arnold Pareto and Generalized Pareto Distributions , 2008 .

[21]  S. Vempala,et al.  The geometry of logconcave functions and sampling algorithms , 2007 .

[22]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[23]  Chao Zhang,et al.  Completing Low-Rank Matrices With Corrupted Samples From Few Coefficients in General Basis , 2015, IEEE Transactions on Information Theory.

[24]  Amit Daniely,et al.  Complexity theoretic limitations on learning halfspaces , 2015, STOC.

[25]  John Langford,et al.  Search Improves Label for Active Learning , 2016, NIPS.

[26]  Maria-Florina Balcan,et al.  The Power of Localization for Efficiently Learning Linear Separators with Noise , 2013, J. ACM.

[27]  Vladimir Vapnik Estimations of dependences based on statistical data , 1982 .

[28]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[29]  Maria-Florina Balcan,et al.  Efficient Learning of Linear Separators under Bounded Noise , 2015, COLT.

[30]  John Langford,et al.  Agnostic Active Learning Without Constraints , 2010, NIPS.

[31]  R. Schapire,et al.  Toward Efficient Agnostic Learning , 1994 .

[32]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[33]  Santosh S. Vempala,et al.  Simulated Annealing for Convex Optimization , 2004 .

[34]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[35]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[36]  Liwei Wang,et al.  Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning , 2011, J. Mach. Learn. Res..

[37]  W. Lockau,et al.  Contents , 2015 .

[38]  E. Lieb,et al.  On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation , 1976 .

[39]  Qiyang Han,et al.  APPROXIMATION AND ESTIMATION OF s-CONCAVE DENSITIES VIA RÉNYI DIVERGENCES. , 2015, Annals of statistics.

[40]  John N. Tsitsiklis,et al.  Active Learning Using Arbitrary Binary Valued Queries , 1993, Machine Learning.

[41]  Shie Mannor,et al.  An Inequality for Nearly Log-Concave Distributions With Applications to Learning , 2004, IEEE Transactions on Information Theory.

[42]  Santosh S. Vempala,et al.  Solving convex programs by random walks , 2002, STOC '02.

[43]  Pravesh Kothari,et al.  Embedding Hard Learning Problems into Gaussian Space , 2014, Electron. Colloquium Comput. Complex..

[44]  Adam Tauman Kalai,et al.  Analysis of Perceptron-Based Active Learning , 2009, COLT.

[45]  Ohad Shamir,et al.  Learning Kernel-Based Halfspaces with the Zero-One Loss , 2010, COLT 2010.

[46]  Philip M. Long,et al.  Baum's Algorithm Learns Intersections of Halfspaces with Respect to Log-Concave Distributions , 2009, APPROX-RANDOM.

[47]  Maria-Florina Balcan,et al.  Learning and 1-bit Compressed Sensing under Asymmetric Noise , 2016, COLT.

[48]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, FOCS.

[49]  David Applegate,et al.  Sampling and integration of near log-concave functions , 1991, STOC '91.

[50]  Chicheng Zhang,et al.  Revisiting Perceptron: Efficient and Label-Optimal Learning of Halfspaces , 2017, NIPS.

[51]  Rocco A. Servedio,et al.  Learning intersections and thresholds of halfspaces , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[52]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[53]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[54]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[55]  Rocco A. Servedio,et al.  Agnostically learning halfspaces , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[56]  Steve Hanneke,et al.  Theory of Disagreement-Based Active Learning , 2014, Found. Trends Mach. Learn..

[57]  Nick Littlestone,et al.  Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm , 2004, Machine Learning.

[58]  Marvin Minsky,et al.  An introduction to computational geometry , 1969 .