Non-parametric Binary regression in metric spaces with KL loss

We propose a non-parametric variant of binary regression, where the hypothesis is regularized to be a Lipschitz function taking a metric space to [0,1] and the loss is logarithmic. This setting presents novel computational and statistical challenges. On the computational front, we derive a novel efficient optimization algorithm based on interior point methods; an attractive feature is that it is parameter-free (i.e., does not require tuning an update step size). On the statistical front, the unbounded loss function presents a problem for classic generalization bounds, based on covering-number and Rademacher techniques. We get around this challenge via an adaptive truncation approach, and also present a lower bound indicating that the truncation is, in some sense, necessary.

[1]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[2]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[3]  Jean-Philippe Vial,et al.  Computing Maximum Likelihood Estimators of Convex Density Functions , 1998, SIAM J. Sci. Comput..

[4]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[5]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[6]  Ulrike von Luxburg,et al.  Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[7]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[8]  Lee-Ad Gottlieb,et al.  Efficient Classification for Metric Data , 2014, IEEE Trans. Inf. Theory.

[9]  H. Whitney Analytic Extensions of Differentiable Functions Defined in Closed Sets , 1934 .

[10]  Goran Lesaja,et al.  Introducing Interior-Point Methods for Introductory Operations Research Courses and/or Linear Programming Courses , 2009 .

[11]  Mihai Anitescu,et al.  A Fast Algorithm for Maximum Likelihood Estimation of Mixture Proportions Using Sequential Quadratic Programming , 2018, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[12]  Lee-Ad Gottlieb,et al.  Efficient Regression in Metric Spaces via Approximate Lipschitz Extension , 2011, IEEE Transactions on Information Theory.

[13]  J. Fourier,et al.  NON-PARAMETRIC LOGISTIC REGRESSION , 1999 .

[14]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[15]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[16]  Robert Krauthgamer,et al.  Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[17]  J. Simonoff Smoothing Methods in Statistics , 1998 .

[18]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[19]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[20]  E. J. McShane,et al.  Extension of range of functions , 1934 .

[21]  S. Ghosal,et al.  Nonparametric binary regression using a Gaussian process prior , 2007 .

[22]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[23]  R. Koenker,et al.  Convex Optimization, Shape Constraints, Compound Decisions, and Empirical Bayes Rules , 2014 .

[24]  Desh Ranjan,et al.  Balls and bins: A study in negative dependence , 1996, Random Struct. Algorithms.

[25]  R. Koenker,et al.  CONVEX OPTIMIZATION, SHAPE CONSTRAINTS, COMPOUND DECISIONS, AND EMPIRICAL BAYES RULES , 2013 .