论文信息 - Non-parametric Binary regression in metric spaces with KL loss

Non-parametric Binary regression in metric spaces with KL loss

We propose a non-parametric variant of binary regression, where the hypothesis is regularized to be a Lipschitz function taking a metric space to [0,1] and the loss is logarithmic. This setting presents novel computational and statistical challenges. On the computational front, we derive a novel efficient optimization algorithm based on interior point methods; an attractive feature is that it is parameter-free (i.e., does not require tuning an update step size). On the statistical front, the unbounded loss function presents a problem for classic generalization bounds, based on covering-number and Rademacher techniques. We get around this challenge via an adaptive truncation approach, and also present a lower bound indicating that the truncation is, in some sense, necessary.

[1] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[2] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[3] Jean-Philippe Vial,et al. Computing Maximum Likelihood Estimators of Convex Density Functions , 1998, SIAM J. Sci. Comput..

[4] Stergios B. Fotopoulos,et al. All of Nonparametric Statistics , 2007, Technometrics.

[5] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .

[6] Ulrike von Luxburg,et al. Distance-Based Classification with Lipschitz Functions , 2004, J. Mach. Learn. Res..

[7] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[8] Lee-Ad Gottlieb,et al. Efficient Classification for Metric Data , 2014, IEEE Trans. Inf. Theory.

[9] H. Whitney. Analytic Extensions of Differentiable Functions Defined in Closed Sets , 1934 .

[10] Goran Lesaja,et al. Introducing Interior-Point Methods for Introductory Operations Research Courses and/or Linear Programming Courses , 2009 .

[11] Mihai Anitescu,et al. A Fast Algorithm for Maximum Likelihood Estimation of Mixture Proportions Using Sequential Quadratic Programming , 2018, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[12] Lee-Ad Gottlieb,et al. Efficient Regression in Metric Spaces via Approximate Lipschitz Extension , 2011, IEEE Transactions on Information Theory.

[13] J. Fourier,et al. NON-PARAMETRIC LOGISTIC REGRESSION , 1999 .

[14] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[15] Ji Zhu,et al. Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[16] Robert Krauthgamer,et al. Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[17] J. Simonoff. Smoothing Methods in Statistics , 1998 .

[18] John Shawe-Taylor,et al. Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[19] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[20] E. J. McShane,et al. Extension of range of functions , 1934 .

[21] S. Ghosal,et al. Nonparametric binary regression using a Gaussian process prior , 2007 .

[22] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[23] R. Koenker,et al. Convex Optimization, Shape Constraints, Compound Decisions, and Empirical Bayes Rules , 2014 .

[24] Desh Ranjan,et al. Balls and bins: A study in negative dependence , 1996, Random Struct. Algorithms.

[25] R. Koenker,et al. CONVEX OPTIMIZATION, SHAPE CONSTRAINTS, COMPOUND DECISIONS, AND EMPIRICAL BAYES RULES , 2013 .