Active Nearest-Neighbor Learning in Metric Spaces

We propose a pool-based non-parametric active learning algorithm for general metric spaces, called MArgin Regularized Metric Active Nearest Neighbor (MARMANN), which outputs a nearest-neighbor classifier. We give prediction error guarantees that depend on the noisy-margin properties of the input sample, and are competitive with those obtained by previously proposed passive learners. We prove that the label complexity of MARMANN is significantly lower than that of any passive learner with similar error guarantees. Our algorithm is based on a generalized sample compression scheme and a new label-efficient active model-selection procedure.

[1]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[2]  Lee-Ad Gottlieb,et al.  Proximity Algorithms for Nearly Doubling Spaces , 2013, SIAM J. Discret. Math..

[3]  Rishabh K. Iyer,et al.  Submodularity in Data Subset Selection and Active Learning , 2015, ICML.

[4]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[5]  Jean-Yves Audibert Fast learning rates in statistical inference through aggregation , 2007, math/0703854.

[6]  Massimiliano Pontil,et al.  Empirical Bernstein Bounds and Sample-Variance Penalization , 2009, COLT.

[7]  Shai Shalev-Shwartz,et al.  Efficient active learning of halfspaces: an aggressive approach , 2012, J. Mach. Learn. Res..

[8]  Robert D. Nowak,et al.  Faster Rates in Regression via Active Learning , 2005, NIPS.

[9]  Jochen Könemann,et al.  A (1+ε)-Embedding of Low Highway Dimension Graphs into Bounded Treewidth Graphs , 2015, ICALP.

[10]  L. Devroye,et al.  Nonparametric Density Estimation: The L 1 View. , 1985 .

[11]  Liu Yang,et al.  Minimax Analysis of Active Learning , 2014, J. Mach. Learn. Res..

[12]  Robert D. Nowak,et al.  Minimax Bounds for Active Learning , 2007, IEEE Transactions on Information Theory.

[13]  John Shawe-Taylor,et al.  PAC-Bayesian Compression Bounds on the Prediction Error of Learning Algorithms for Classification , 2005, Machine Learning.

[14]  Robert Krauthgamer,et al.  Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[15]  Lee-Ad Gottlieb,et al.  Adaptive metric dimensionality reduction , 2013, Theor. Comput. Sci..

[16]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[17]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[18]  Ruth Urner,et al.  Active Nearest Neighbors in Changing Environments , 2015, ICML.

[19]  Aryeh Kontorovich,et al.  Maximum Margin Multiclass Nearest Neighbors , 2014, ICML.

[20]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[21]  L. Zhao Exponential bounds of mean error for the nearest neighbor estimates of regression functions*1 , 1987 .

[22]  Manfred K. Warmuth,et al.  Sample compression, learnability, and the Vapnik-Chervonenkis dimension , 1995, Machine Learning.

[23]  Shai Ben-David,et al.  PLAL: Cluster-based active learning , 2013, COLT.

[24]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[25]  Claudio Gentile,et al.  Active Learning on Trees and Graphs , 2010, COLT.

[26]  Aryeh Kontorovich,et al.  Exact Lower Bounds for the Agnostic Probably-Approximately-Correct (PAC) Machine Learning Model , 2016, The Annals of Statistics.

[27]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[28]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[29]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[30]  Sanjeev R. Kulkarni,et al.  Rates of convergence of nearest neighbor estimation under arbitrary sampling , 1995, IEEE Trans. Inf. Theory.

[31]  Manfred K. Warmuth,et al.  Relating Data Compression and Learnability , 2003 .

[32]  Samory Kpotufe,et al.  k-NN Regression Adapts to Local Intrinsic Dimension , 2011, NIPS.

[33]  Steve Hanneke Rates of convergence in active learning , 2011, 1103.1790.

[34]  Yi Li,et al.  Using the doubling dimension to analyze the generalization of learning algorithms , 2009, J. Comput. Syst. Sci..

[35]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[36]  Shai Ben-David,et al.  Hierarchical Label Queries with Data-Dependent Partitions , 2015, COLT.