Learning from pairwise constraints by Similarity Neural Networks

In this paper we present Similarity Neural Networks (SNNs), a neural network model able to learn a similarity measure for pairs of patterns, exploiting a binary supervision on their similarity/dissimilarity relationships. Pairwise relationships, also referred to as pairwise constraints, generally contain less information than class labels, but, in some contexts, are easier to obtain from human supervisors. The SNN architecture guarantees the basic properties of a similarity measure (symmetry and non negativity) and it can deal with non-transitivity of the similarity criterion. Unlike the majority of the metric learning algorithms proposed so far, it can model non-linear relationships among data still providing a natural out-of-sample extension to novel pairs of patterns. The theoretical properties of SNNs and their application to Semi-Supervised Clustering are investigated. In particular, we introduce a novel technique that allows the clustering algorithm to compute the optimal representatives of a data partition by means of backpropagation on the input layer, biased by a L(2) norm regularizer. An extensive set of experimental results are provided to compare SNNs with the most popular similarity learning algorithms. Both on benchmarks and real world data, SNNs and SNN-based clustering show improved performances, assessing the advantage of the proposed neural network approach to similarity measure learning.

[1]  Yi Liu,et al.  BoostCluster: boosting clustering by pairwise constraints , 2007, KDD '07.

[2]  Simone Santini,et al.  Similarity Measures , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  P. J. Werbos,et al.  Backpropagation: past and future , 1988, IEEE 1988 International Conference on Neural Networks.

[4]  Feiping Nie,et al.  Learning a Mahalanobis distance metric for data clustering and classification , 2008, Pattern Recognit..

[5]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[6]  Hong Chang,et al.  Kernel-Based Metric Adaptation with Pairwise Constraints , 2005, ICMLC.

[7]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[8]  Inderjit S. Dhillon,et al.  Low-Rank Kernel Learning with Bregman Matrix Divergences , 2009, J. Mach. Learn. Res..

[9]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[10]  Rong Jin,et al.  Learning nonparametric kernel matrices from pairwise constraints , 2007, ICML '07.

[11]  Ian H. Witten,et al.  Weka-A Machine Learning Workbench for Data Mining , 2005, Data Mining and Knowledge Discovery Handbook.

[12]  David G. Stork,et al.  Pattern Classification , 1973 .

[13]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[14]  Dimitrios Gunopulos,et al.  Large margin nearest neighbor classifiers , 2005, IEEE Transactions on Neural Networks.

[15]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[16]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[17]  Marco Maggini,et al.  Semi-supervised clustering using similarity neural networks , 2009, 2009 International Joint Conference on Neural Networks.

[18]  Rong Jin,et al.  Active kernel learning , 2008, ICML '08.

[19]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[20]  M A WALLACH,et al.  On psychological similarity. , 1958, Psychological review.

[21]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[22]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[23]  Michael Wooldridge,et al.  Proceedings of the 21st International Joint Conference on Artificial Intelligence , 2009 .

[24]  Tomer Hertz,et al.  Boosting margin based distance functions for clustering , 2004, ICML.

[25]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[26]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[27]  Marco Maggini,et al.  Learning Similarity Measures from Pairwise Constraints with Neural Networks , 2008, ICANN.

[28]  Maya R. Gupta,et al.  Generative models for similarity-based classification , 2008, Pattern Recognit..

[29]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[30]  M. Richter Classification and Learning of Similarity Measures , 1993 .

[31]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[32]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[33]  Mahdieh Soleymani Baghshah,et al.  Non-linear metric learning using pairwise similarity and dissimilarity constraints and the geometrical structure of data , 2010, Pattern Recognit..

[34]  Hong Yan,et al.  Advances in Machine Learning and Cybernetics, 4th International Conference, ICMLC 2005, Guangzhou, China, August 18-21, 2005, Revised Selected Papers , 2006, ICMLC.

[35]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[36]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[37]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[38]  Daphna Weinshall,et al.  Learning distance functions for image retrieval , 2004, CVPR 2004.

[39]  John Shawe-Taylor,et al.  Symmetries and discriminability in feedforward network architectures , 1993, IEEE Trans. Neural Networks.

[40]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[41]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[42]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[43]  Mahdieh Soleymani Baghshah,et al.  Efficient Kernel Learning from Constraints and Unlabeled Data , 2010, 2010 20th International Conference on Pattern Recognition.

[44]  Andrew McCallum,et al.  Piecewise pseudolikelihood for efficient training of conditional random fields , 2007, ICML '07.

[45]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[46]  Mahdieh Soleymani Baghshah,et al.  Semi-Supervised Metric Learning Using Pairwise Constraints , 2009, IJCAI.

[47]  Wei Liu,et al.  Semi-supervised distance metric learning for Collaborative Image Retrieval , 2008, CVPR.

[48]  Hong Chang,et al.  A Kernel Approach for Semisupervised Metric Learning , 2007, IEEE Transactions on Neural Networks.

[49]  A. Tversky Features of Similarity , 1977 .

[50]  C.-C. Jay Kuo,et al.  A new initialization technique for generalized Lloyd iteration , 1994, IEEE Signal Processing Letters.

[51]  Robert P. W. Duin,et al.  Prototype selection for dissimilarity-based classifiers , 2006, Pattern Recognit..

[52]  Nello Cristianini,et al.  Efficiently Learning the Metric with Side-Information , 2003, ALT.

[53]  Marvin Minsky,et al.  Perceptrons: expanded edition , 1988 .

[54]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[55]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[56]  Zhenguo Li,et al.  Constrained clustering by spectral kernel learning , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[57]  Mahdieh Soleymani Baghshah,et al.  Kernel-based metric learning for semi-supervised clustering , 2010, Neurocomputing.

[58]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .