Unsupervised Distance Metric Learning Using Predictability

Distance-based learning methods, like clustering and SVMs, are dependent on good distance metrics. This paper does unsupervised metric learning in the context of clustering. We seek transformations of data which give clean and well separated clusters where clean clusters are those for which membership can be accurately predicted. The transformation (hence distance metric) is obtained by minimizing the blur ratio, which is defined as the ratio of the within cluster variance divided by the total data variance in the transformed space. For minimization we propose an iterative procedure, Clustering Predictions of Cluster Membership (CPCM). CPCM alternately (a) predicts cluster memberships (e.g., using linear regression) and (b) clusters these predictions (e.g., using k-means). With linear regression and k-means, this algorithm is guaranteed to converge to a fixed point. The resulting clusters are invariant to linear transformations of original features, and tend to eliminate noise features by driving their weights to zero. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-08-23. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/885 Unsupervised distance metric learning using predictability Abhishek A. Gupta Department of Statistics University of Pennsylvania abgupta@wharton.upenn.edu Dean P. Foster Department of Statistics University of Pennsylvania foster@wharton.upenn.edu Lyle H. Ungar Department of Computer and Information Science University of Pennsylvania ungar@cis.upenn.edu

[1]  Chris H. Q. Ding,et al.  Adaptive dimension reduction for clustering high dimensional data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[3]  Misha Pavel,et al.  Adjustment Learning and Relevant Component Analysis , 2002, ECCV.

[4]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[5]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[6]  James B. Orlin,et al.  A Scale Invariant Clustering Using Minimum Volume Ellipsoids , 2006 .

[7]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[8]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[9]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[10]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[11]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[13]  Jiming Peng,et al.  A new theoretical framework for K-means-type clustering , 2004 .

[14]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[15]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[16]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[17]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[18]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[19]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[20]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[21]  Dimitrios Charalampidis,et al.  A modified k-means algorithm for circular invariant clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Andrew W. Fitzgibbon,et al.  On Affine Invariant Clustering and Automatic Cast Listing in Movies , 2002, ECCV.

[23]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[24]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[25]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[26]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[27]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .