Learning Bregman Distance Functions for Semi-Supervised Clustering

Learning distance functions with side information plays a key role in many data mining applications. Conventional distance metric learning approaches often assume that the target distance function is represented in some form of Mahalanobis distance. These approaches usually work well when data are in low dimensionality, but often become computationally expensive or even infeasible when handling high-dimensional data. In this paper, we propose a novel scheme of learning nonlinear distance functions with side information. It aims to learn a Bregman distance function using a nonparametric approach that is similar to Support Vector Machines. We emphasize that the proposed scheme is more general than the conventional approach for distance metric learning, and is able to handle high-dimensional data efficiently. We verify the efficacy of the proposed distance learning method with extensive experiments on semi-supervised clustering. The comparison with state-of-the-art approaches for learning distance functions with side information reveals clear advantages of the proposed technique.

[1]  Tomer Hertz,et al.  Boosting margin based distance functions for clustering , 2004, ICML.

[2]  Yi Liu,et al.  An Efficient Algorithm for Local Distance Metric Learning , 2006, AAAI.

[3]  Nenghai Yu,et al.  Learning to tag , 2009, WWW '09.

[4]  Shuicheng Yan,et al.  Neighborhood preserving embedding , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[6]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[7]  Sergei Vassilvitskii,et al.  Generalized distances between rankings , 2010, WWW '10.

[8]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[9]  Ann Q. Gates,et al.  TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005 .

[10]  Alfred Gray Mean-value Theorems , 2004 .

[11]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[12]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[13]  Saiful Islam,et al.  Mahalanobis Distance , 2009, Encyclopedia of Biometrics.

[14]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Trans. Knowl. Data Eng..

[15]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[16]  Judy Kay,et al.  Clustering and Sequential Pattern Mining of Online Collaborative Learning Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[17]  Wei Liu,et al.  Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[19]  Rong Jin,et al.  A Boosting Framework for Visuality-Preserving Distance Metric Learning and Its Application to Medical Image Retrieval , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[21]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[22]  Yi Liu,et al.  BoostCluster: boosting clustering by pairwise constraints , 2007, KDD '07.

[23]  Alena Lukasová,et al.  Hierarchical agglomerative clustering procedure , 1979, Pattern Recognit..

[24]  Zhi-Hua Zhou,et al.  Learning instance specific distances using metric propagation , 2009, ICML '09.

[25]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[26]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[27]  Timos K. Sellis,et al.  Boosting the ranking function learning process using clustering , 2008, WIDM '08.

[28]  KoprinskaIrena,et al.  Clustering and Sequential Pattern Mining of Online Collaborative Learning Data , 2009 .

[29]  Wei Liu,et al.  Semi-supervised distance metric learning for collaborative image retrieval and clustering , 2010, ACM Trans. Multim. Comput. Commun. Appl..

[30]  P. D. T. A. Elliott Mean-value theorems , 1979 .

[31]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[32]  Philip Fraundorf,et al.  Thermal roots of correlation-based complexity , 2011, Complex..

[33]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[34]  Rong Jin,et al.  A unified log-based relevance feedback scheme for image retrieval , 2006, IEEE Transactions on Knowledge and Data Engineering.

[35]  Kilian Q. Weinberger,et al.  Metric Learning for Kernel Regression , 2007, AISTATS.

[36]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[37]  Changshui Zhang,et al.  Classification of gene-expression data: The manifold-based metric learning way , 2006, Pattern Recognit..

[38]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[39]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[40]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[41]  Luo Si,et al.  Collaborative image retrieval via regularized metric learning , 2006, Multimedia Systems.

[42]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[43]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[44]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[45]  Michael R. Lyu,et al.  A Multimodal and Multilevel Ranking Scheme for Large-Scale Video Retrieval , 2008, IEEE Transactions on Multimedia.

[46]  Nenghai Yu,et al.  Flickr distance , 2008, ACM Multimedia.

[47]  Andrew W. Moore,et al.  Efficient Locally Weighted Polynomial Regression Predictions , 1997, ICML.

[48]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[49]  LiuWei,et al.  Semi-supervised distance metric learning for collaborative image retrieval and clustering , 2010 .

[50]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[51]  Nenghai Yu,et al.  Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering , 2009, NIPS.

[52]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[53]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.