Functions with average smoothness: structure, algorithms, and learning

We initiate a program of average-smoothness analysis for efficiently learning real-valued functions on metric spaces. Rather than using the (global) Lipschitz constant as the regularizer, we define a local slope at each point and gauge the function complexity as the average of these values. Since the average is often much smaller than the maximum, this complexity measure can yield considerably sharper generalization bounds --- assuming that these admit a refinement where the global Lipschitz constant is replaced by our average of local slopes. Our first major contribution is to obtain just such distribution-sensitive bounds. This required overcoming a number of technical challenges, perhaps the most significant of which was bounding the {\em empirical} covering numbers, which can be much worse-behaved than the ambient ones. This in turn is based on a novel Lipschitz-type extension, which is a pointwise minimizer of the local slope, and may be of independent interest. Our combinatorial results are accompanied by efficient algorithms for denoising the random sample, as well as guarantees that the extension from the sample to the whole space will continue to be, with high probability, smooth on average. Along the way we discover a surprisingly rich combinatorial and analytic structure in the function class we define.

[1]  Aryeh Kontorovich,et al.  Learning discrete distributions with infinite support , 2020, NeurIPS.

[2]  A. Kontorovich,et al.  Fast and Bayes-consistent nearest neighbors , 2019, AISTATS.

[3]  Aryeh Kontorovich,et al.  Universal Bayes Consistency in Metric Spaces , 2019, 2020 Information Theory and Applications Workshop (ITA).

[4]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[5]  Yin Tat Lee,et al.  Solving linear programs in the current matrix multiplication time , 2018, STOC.

[6]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[7]  Prerona Dutta,et al.  Covering numbers for bounded variation functions , 2018, Journal of Mathematical Analysis and Applications.

[8]  Lee-Ad Gottlieb,et al.  Near-Optimal Sample Compression for Nearest Neighbors , 2014, IEEE Transactions on Information Theory.

[9]  Lee-Ad Gottlieb,et al.  Efficient Regression in Metric Spaces via Approximate Lipschitz Extension , 2011, IEEE Transactions on Information Theory.

[10]  Pasin Manurangsi,et al.  Almost-polynomial ratio ETH-hardness of approximating densest k-subgraph , 2016, STOC.

[11]  Pavel Shvartsman,et al.  Whitney-type extension theorems for jets generated by Sobolev functions , 2016, 1607.01660.

[12]  Kinjal Basu,et al.  Transformations and Hardy-Krause Variation , 2015, SIAM J. Numer. Anal..

[13]  Lee-Ad Gottlieb,et al.  Adaptive metric dimensionality reduction , 2013, Theor. Comput. Sci..

[14]  L. Ambrosio,et al.  Sobolev and bounded variation functions on metric measure spaces , 2014 .

[15]  Sanjoy Dasgupta,et al.  Rates of Convergence for Nearest Neighbor Classification , 2014, NIPS.

[16]  Lee-Ad Gottlieb,et al.  Efficient Classification for Metric Data , 2014, IEEE Trans. Inf. Theory.

[17]  N. Merentes,et al.  Bounded Variation and Around , 2013 .

[18]  Vikas K. Garg,et al.  Adaptivity to Local Smoothness and Dimension in Kernel Regression , 2013, NIPS.

[19]  Luigi Ambrosio,et al.  Lectures on analysis in metric spaces , 2013 .

[20]  D. Berend,et al.  A sharp estimate of the binomial mean absolute deviation with applications , 2013 .

[21]  Christos Koufogiannakis,et al.  A Nearly Linear-Time PTAS for Explicit Fractional Packing and Covering Linear Programs , 2013, Algorithmica.

[22]  Ruth Urner,et al.  Probabilistic Lipschitzness A niceness assumption for deterministic labels , 2013 .

[23]  Sanjoy Dasgupta,et al.  A tree-based regressor that adapts to intrinsic dimension , 2012, J. Comput. Syst. Sci..

[24]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[25]  Samory Kpotufe,et al.  k-NN Regression Adapts to Local Intrinsic Dimension , 2011, NIPS.

[26]  Averaged modulus of continuity and bracket compactness , 2010 .

[27]  Alessandro Panconesi,et al.  Concentration of Measure for the Analysis of Randomized Algorithms , 2009 .

[28]  Adam M. Oberman An explicit solution of the Lipschitz extension problem , 2008 .

[29]  R. Nickl,et al.  Bracketing Metric Entropy Rates and Empirical Central Limit Theorems for Function Classes of Besov- and Sobolev-Type , 2007 .

[30]  Richard Cole,et al.  Searching dynamic point sets in spaces with bounded doubling dimension , 2006, STOC '06.

[31]  Y. Peres,et al.  Tug-of-war and the infinity Laplacian , 2006, math/0605002.

[32]  Sariel Har-Peled,et al.  Fast Construction of Nets in Low-Dimensional Metrics and Their Applications , 2004, SIAM J. Comput..

[33]  Robert Krauthgamer,et al.  Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[34]  Philip M. Long Efficient algorithms for learning functions with bounded variation , 2004, Inf. Comput..

[35]  A. Tsybakov,et al.  Introduction à l'estimation non-paramétrique , 2003 .

[36]  S. Mendelson,et al.  Entropy and the combinatorial dimension , 2002, math/0203275.

[37]  Petri Juutinen,et al.  ABSOLUTELY MINIMIZING LIPSCHITZ EXTENSIONS ON A METRIC SPACE , 2002 .

[38]  E. Berger UNIFORM CENTRAL LIMIT THEOREMS (Cambridge Studies in Advanced Mathematics 63) By R. M. D UDLEY : 436pp., £55.00, ISBN 0-521-46102-2 (Cambridge University Press, 1999). , 2001 .

[39]  A. W. van der Vaart,et al.  Uniform Central Limit Theorems , 2001 .

[40]  J. Heinonen Lectures on Analysis on Metric Spaces , 2000 .

[41]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[42]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[43]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[44]  I. Johnstone,et al.  Minimax estimation via wavelet shrinkage , 1998 .

[45]  R. Caflisch Monte Carlo and quasi-Monte Carlo methods , 1998, Acta Numerica.

[46]  Sanjeev R. Kulkarni,et al.  Covering numbers for real-valued function classes , 1997, IEEE Trans. Inf. Theory.

[47]  H. Balsters,et al.  Learnability with respect to fixed distributions , 1991 .

[48]  J. B. G. Frenk,et al.  Heuristic for the 0-1 Min-Knapsack Problem , 1991, Acta Cybern..

[49]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[50]  Alon Itai,et al.  Learnability with Respect to Fixed Distributions , 1991, Theor. Comput. Sci..

[51]  B. Sendov,et al.  The averaged moduli of smoothness , 1988 .

[52]  E. Giné,et al.  Some Limit Theorems for Empirical Processes , 1984 .

[53]  Lauwerens Kuipers,et al.  Uniform distribution of sequences , 1974 .

[54]  A. Kolmogorov,et al.  Entropy and "-capacity of sets in func-tional spaces , 1961 .

[55]  E. J. McShane,et al.  Extension of range of functions , 1934 .

[56]  H. Whitney Analytic Extensions of Differentiable Functions Defined in Closed Sets , 1934 .