Margin distribution explanation on metric learning for nearest neighbor classification

The importance of metrics in machine learning and pattern recognition algorithms has led to an increasing interest for optimizing distance metrics in recent years. Most of the state-of-the-art methods focus on learning Mahalanobis distances and the learned metrics are in turn heavily used for the nearest neighbor-based classification (NN). However, until now no theoretical link has been established between the learned metrics and their performance in NN. Although some existing methods such as large-margin nearest neighbor (LMNN), have employed the concept of large margin to learn a data-dependent metric, the link between the margin and the generalization performance for the metric is not fully understood. Though the recent work has indeed provided tenable margin distribution explanation on Boosting, the margin used in metric learning is quite different from that in Boosting. Thus, in this paper we try to analyze the effectiveness of metric learning algorithms for NN from the perspective of the margin distribution and provide a general and effective evaluation criterion for metric learning. On the one hand, we derive the generalization error upper bound for NN with respect to the Mahalanobis metric. On the other hand, the experiments on several benchmark datasets using existing metric learning algorithms demonstrate that large margin distribution can be obtained by these algorithms. Motivated by our analysis above, we also present a novel margin based metric learning algorithm for NN, which explicitly enlarges the margin distribution on various datasets and achieves very competitive results with the existing metric learning algorithms.

[1]  Lei Wang,et al.  Positive Semidefinite Metric Learning Using Boosting-like Algorithms , 2011, J. Mach. Learn. Res..

[2]  Bin Gu,et al.  Incremental Support Vector Learning for Ordinal Regression , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[5]  Marc Sebban,et al.  Similarity Learning for Provably Accurate Sparse Linear Classification , 2012, ICML.

[6]  Yiming Ying,et al.  Guaranteed Classification via Regularized Similarity Learning , 2013, Neural Computation.

[7]  Gregory Shakhnarovich,et al.  Discriminative Metric Learning by Neighborhood Gerrymandering , 2014, NIPS.

[8]  Damien Garreau,et al.  Metric Learning for Temporal Sequence Alignment , 2014, NIPS.

[9]  Jiwen Lu,et al.  Neighborhood repulsed metric learning for kinship verification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[11]  Koby Crammer,et al.  Margin Analysis of the LVQ Algorithm , 2002, NIPS.

[12]  Lei Wang,et al.  Scalable Large-Margin Mahalanobis Distance Metric Learning , 2010, IEEE Transactions on Neural Networks.

[13]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[14]  Zhi-Hua Zhou Large Margin Distribution Learning , 2014, ANNPR.

[15]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[16]  Yang Li,et al.  Risk-based adaptive metric learning for nearest neighbour classification , 2015, Neurocomputing.

[17]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[18]  Amaury Habrard,et al.  Robustness and generalization for metric learning , 2012, Neurocomputing.

[19]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[20]  David Zhang,et al.  A Kernel Classification Framework for Metric Learning , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Qiong Cao,et al.  Generalization bounds for metric and similarity learning , 2012, Machine Learning.

[22]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[23]  Zhi-Hua Zhou,et al.  On the doubt about margin explanation of boosting , 2010, Artif. Intell..

[24]  Shiliang Sun,et al.  Hierarchical Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2011, Int. J. Pattern Recognit. Artif. Intell..

[25]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[26]  Rong Jin,et al.  Regularized Distance Metric Learning: Theory and Algorithm , 2009, NIPS.

[27]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[28]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[29]  Tony Jebara,et al.  Variance Penalizing AdaBoost , 2011, NIPS.

[30]  L. BartlettP. The sample complexity of pattern classification with neural networks , 2006 .

[31]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[32]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[33]  Lei Wang,et al.  Efficient Dual Approach to Distance Metric Learning , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Frank-Michael Schleif,et al.  Metric learning for sequences in relational LVQ , 2015, Neurocomputing.

[35]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.