Nonparametric Regression with Comparisons: Escaping the Curse of Dimensionality with Ordinal Information

In supervised learning, we leverage a labeled dataset to design methods for function estimation. In many practical situations, we are able to obtain alternative feedback, possibly at a low cost. A broad goal is to understand the usefulness of, and to design algorithms to exploit, this alternative feedback. We focus on a semi-supervised setting where we obtain additional ordinal (or comparison) information for potentially unlabeled samples. We consider ordinal feedback of varying qualities where we have either a perfect ordering of the samples, a noisy ordering of the samples or noisy pairwise comparisons between the samples. We provide a precise quantification of the usefulness of these types of ordinal feedback in non-parametric regression, showing that in many cases it is possible to accurately estimate an underlying function with a very small labeled set, effectively escaping the curse of dimensionality. We develop an algorithm called Ranking-Regression (RR) and analyze its accuracy as a function of size of the labeled and unlabeled datasets and various noise parameters. We also present lower bounds, that establish fundamental limits for the task and show that RR is optimal in a variety of settings. Finally, we present experiments that show the efficacy of RR and investigate its robustness to various sources of noise and model-misspecification.

[1]  L. Thurstone A law of comparative judgment. , 1994 .

[2]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[3]  Arpit Agarwal,et al.  Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons , 2017, COLT.

[4]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[5]  Martin J. Wainwright,et al.  Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence , 2015, J. Mach. Learn. Res..

[6]  S. Vempala,et al.  The geometry of logconcave functions and sampling algorithms , 2007 .

[7]  Sanjoy Dasgupta,et al.  Rates of convergence for the cluster tree , 2010, NIPS.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Shachar Lovett,et al.  Active Classification with Comparison Queries , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[10]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[11]  Cun-Hui Zhang Risk bounds in isotonic regression , 2002 .

[12]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[13]  A. Culyer Thurstone’s Law of Comparative Judgment , 2014 .

[14]  Ambuj Tewari,et al.  Consistent algorithms for multiclass classification with an abstain option , 2018 .

[15]  Maya R. Gupta,et al.  How to Analyze Paired Comparison Data , 2011 .

[16]  Mark Braverman,et al.  Sorting from Noisy Information , 2009, ArXiv.

[17]  Martin J. Wainwright,et al.  Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues , 2015, IEEE Transactions on Information Theory.

[18]  Pierre C. Bellec,et al.  Sharp oracle bounds for monotone and convex regression through aggregation , 2015, J. Mach. Learn. Res..

[19]  R. Luce,et al.  Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[20]  P. Bellec Sharp oracle inequalities for Least Squares estimators in shape restricted regression , 2015, 1510.08029.

[21]  Martin J. Wainwright,et al.  A Permutation-Based Model for Crowd Labeling: Optimal Estimation and Robustness , 2016, IEEE Transactions on Information Theory.

[22]  E. Gilbert A comparison of signalling alphabets , 1952 .

[23]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[24]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[25]  R. Plackett The Analysis of Permutations , 1975 .

[26]  R. Duncan Luce,et al.  Individual Choice Behavior: A Theoretical Analysis , 1979 .

[27]  Zhuowen Tu,et al.  Learning to Predict from Crowdsourced Data , 2014, UAI.

[28]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Anima Anandkumar,et al.  Learning From Noisy Singly-labeled Data , 2017, ICLR.

[30]  Maria-Florina Balcan,et al.  Learning and 1-bit Compressed Sensing under Asymmetric Noise , 2016, COLT.

[31]  Steve Hanneke,et al.  Theoretical foundations of active learning , 2009 .

[32]  Francisco Herrera,et al.  A Tutorial on Distance Metric Learning: Mathematical Foundations, Algorithms and Software , 2018, ArXiv.

[33]  Hongyang R. Zhang,et al.  Noise-Tolerant Interactive Learning from Pairwise Comparisons , 2017, 1704.05820.

[34]  Adityanand Guntuboyina,et al.  On risk bounds in isotonic and other shape restricted regression problems , 2013, 1311.3765.

[35]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[36]  Yichong Xu,et al.  Noise-Tolerant Interactive Learning from Pairwise Comparisons with Near-Minimal Label Complexity , 2017 .

[37]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[38]  Sham M. Kakade,et al.  Convergence Rates of Active Learning for Maximum Likelihood Estimation , 2015, NIPS.

[39]  Sanjoy Dasgupta,et al.  Learning with Feature Feedback: from Theory to Practice , 2017, AISTATS.

[40]  Felix A Faber,et al.  Machine Learning Energies of 2 Million Elpasolite (ABC_{2}D_{6}) Crystals. , 2015, Physical review letters.

[41]  Sabyasachi Chatterjee,et al.  Isotonic regression in general dimensions , 2017, The Annals of Statistics.

[42]  Sanjoy Dasgupta,et al.  Learning from partial correction , 2017, ArXiv.

[43]  Robert A. Lordo,et al.  Nonparametric and Semiparametric Models , 2005, Technometrics.

[44]  Xavier Baró,et al.  Apparent and Real Age Estimation in Still Images with Deep Residual Regressors on Appa-Real Database , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[45]  Adam Tauman Kalai,et al.  Crowdsourcing Feature Discovery via Adaptively Chosen Comparisons , 2015, HCOMP.

[46]  C. Craig On the Tchebychef Inequality of Bernstein , 1933 .

[47]  James Theiler,et al.  Accelerated search for materials with targeted properties by adaptive design , 2016, Nature Communications.

[48]  R. Graham,et al.  Spearman's Footrule as a Measure of Disarray , 1977 .

[49]  Maria-Florina Balcan,et al.  The power of localization for efficiently learning linear separators with noise , 2014, STOC.

[50]  Robert D. Nowak,et al.  Faster Rates in Regression via Active Learning , 2005, NIPS.