Distance metric learning with the Universum

Abstract Universum, a set of examples that do not belong to any class of interest for a classification problem, has been playing an important role in improving the performance of many machine learning methods. Since Universum examples are not required to have the same distribution as the training data, they can contain prior information for the possible classifiers. In this paper, we propose a novel distance metric learning method for nearest-neighbor (NN) classification, namely U -LMNN, that exploits prior information contained in the available Universum examples. Based on the large-margin nearest neighbor (LMNN) method, U -LMNN maximizes, for each training example, the margin between its nearest neighbor of the same class and the neighbors of different classes, while controlling the generalization capacity through the number of contradictions on Universum examples. Experimental results on synthetic as well as real-world data sets demonstrate a good performance of U -LMNN compared to the conventional LMNN method.

[1]  Lei Wang,et al.  Positive Semidefinite Metric Learning Using Boosting-like Algorithms , 2011, J. Mach. Learn. Res..

[2]  Jun Wang,et al.  A metric learning perspective of SVM: on the relation of LMNN and SVM , 2012, AISTATS.

[3]  Wuyang Dai,et al.  Empirical Study of the Universum SVM Learning for High-Dimensional Data , 2009, ICANN.

[4]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[5]  Boleslaw K. Szymanski,et al.  Taming the Curse of Dimensionality in Kernels and Novelty Detection , 2004, WSC.

[6]  Dan Zhang,et al.  Document clustering with universum , 2011, SIGIR.

[7]  Carey E. Priebe,et al.  Efficiency investigation of manifold matching for text document classification , 2013, Pattern Recognit. Lett..

[8]  I. Jolliffe Principal Component Analysis , 2002 .

[9]  Gang Qian,et al.  Recognizing body poses using multilinear analysis and semi-supervised learning , 2009, Pattern Recognit. Lett..

[10]  Fumin Shen,et al.  {\cal U}Boost: Boosting with the Universum , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[12]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[13]  Jason Weston,et al.  Inference with the Universum , 2006, ICML.

[14]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[17]  Hong Chang,et al.  A Kernel Approach for Semisupervised Metric Learning , 2007, IEEE Transactions on Neural Networks.

[18]  Yong Shi,et al.  Twin support vector machine with Universum data , 2012, Neural Networks.

[19]  Peng Li,et al.  Distance Metric Learning with Eigenvalue Optimization , 2012, J. Mach. Learn. Res..

[20]  Jiwen Lu,et al.  Regularized local metric learning for person re-identification , 2015, Pattern Recognit. Lett..

[21]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[22]  Bernhard Schölkopf,et al.  An Analysis of Inference with the Universum , 2007, NIPS.

[23]  David G. Stork,et al.  Pattern Classification , 1973 .

[24]  Nicoletta Dessì,et al.  Assessing similarity of feature selection techniques in high-dimensional domains , 2013, Pattern Recognit. Lett..

[25]  Pong C. Yuen,et al.  Semi-supervised metric learning via topology preserving multiple semi-supervised assumptions , 2013, Pattern Recognit..

[26]  Bernard De Baets,et al.  Supervised distance metric learning through maximization of the Jeffrey divergence , 2017, Pattern Recognit..

[27]  Ying Wang,et al.  High-dimensional Pattern Regression Using Machine Learning: from Medical Images to Continuous Clinical Variables However, Support Vector Regression Has Some Disadvantages That Become Especially , 2022 .