Kernel-Based Distance Metric Learning for Supervised $k$ -Means Clustering

Finding an appropriate distance metric that accurately reflects the (dis)similarity between examples is a key to the success of <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-means clustering. While it is not always an easy task to specify a good distance metric, we can try to learn one based on prior knowledge from some available clustered data sets, an approach that is referred to as supervised clustering. In this paper, a kernel-based distance metric learning method is developed to improve the practical use of <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-means clustering. Given the corresponding optimization problem, we derive a meaningful Lagrange dual formulation and introduce an efficient algorithm in order to reduce the training complexity. Our formulation is simple to implement, allowing a large-scale distance metric learning problem to be solved in a computationally tractable way. Experimental results show that the proposed method yields more robust and better performances on synthetic as well as real-world data sets compared to other state-of-the-art distance metric learning methods.

[1]  Na Chen,et al.  Error Analysis for Matrix Elastic-Net Regularization Algorithms , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[3]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[4]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[5]  Boonserm Kijsirikul,et al.  A new kernelization framework for Mahalanobis distance learning algorithms , 2010, Neurocomputing.

[6]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[7]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[8]  Daniel Marcu,et al.  A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior , 2005, J. Mach. Learn. Res..

[9]  Mary Inaba,et al.  Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering: (extended abstract) , 1994, SCG '94.

[10]  Mahdieh Soleymani Baghshah,et al.  Kernel-based metric learning for semi-supervised clustering , 2010, Neurocomputing.

[11]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, ICML '05.

[12]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[13]  Thorsten Joachims,et al.  Supervised clustering with support vector machines , 2005, ICML.

[14]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[15]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[16]  Chih-Jen Lin,et al.  Newton's Method for Large Bound-Constrained Optimization Problems , 1999, SIAM J. Optim..

[17]  Francis R. Bach,et al.  Large-Margin Metric Learning for Constrained Partitioning Problems , 2014, ICML.

[18]  Deepa Paranjpe,et al.  Semi-supervised clustering with metric learning using relative comparisons , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[19]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[20]  Inderjit S. Dhillon,et al.  Inductive Regularized Learning of Kernel Functions , 2010, NIPS.

[21]  Greg Mori,et al.  Latent Maximum Margin Clustering , 2013, NIPS.

[22]  Bernard De Baets,et al.  Distance metric learning for ordinal classification based on triplet constraints , 2017, Knowl. Based Syst..

[23]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[24]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[25]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[26]  Nenghai Yu,et al.  Learning Bregman Distance Functions for Semi-Supervised Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[27]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[28]  Gert R. G. Lanckriet,et al.  Efficient Learning of Mahalanobis Metrics for Ranking , 2014, ICML.

[29]  N. Higham COMPUTING A NEAREST SYMMETRIC POSITIVE SEMIDEFINITE MATRIX , 1988 .

[30]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[31]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[32]  Lei Wang,et al.  Efficient Dual Approach to Distance Metric Learning , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Yaoliang Yu,et al.  Rank/Norm Regularization with Closed-Form Solutions: Application to Subspace Clustering , 2011, UAI.

[34]  Yunming Ye,et al.  Extensions of Kmeans-Type Algorithms: A New Clustering Framework by Integrating Intracluster Compactness and Intercluster Separation , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[36]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[37]  Fatih Porikli,et al.  Large-Scale Metric Learning: A Voyage From Shallow to Deep , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[38]  Daoqiang Zhang,et al.  Semi-supervised clustering with metric learning: An adaptive kernel method , 2010, Pattern Recognit..

[39]  Matthieu Cord,et al.  Closed-Form Training of Mahalanobis Distance for Supervised Clustering , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[41]  Inderjit S. Dhillon,et al.  Metric and Kernel Learning Using a Linear Transformation , 2009, J. Mach. Learn. Res..

[42]  Jiming Peng,et al.  Advanced Optimization Laboratory Title : Approximating K-means-type clustering via semidefinite programming , 2005 .

[43]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[44]  Hong Chang,et al.  A Kernel Approach for Semisupervised Metric Learning , 2007, IEEE Transactions on Neural Networks.

[45]  David Zhang,et al.  A Kernel Classification Framework for Metric Learning , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[46]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[47]  Bernard De Baets,et al.  Supervised distance metric learning through maximization of the Jeffrey divergence , 2017, Pattern Recognit..

[48]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[49]  Hong Jia,et al.  A New Distance Metric for Unsupervised Learning of Categorical Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[50]  Thorsten Joachims,et al.  Supervised k-Means Clustering , 2008 .

[51]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[52]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[53]  Bernard De Baets,et al.  An approach to supervised distance metric learning based on difference of convex functions programming , 2018, Pattern Recognit..

[54]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[55]  Bernard De Baets,et al.  Large-scale distance metric learning for k-nearest neighbors regression , 2016, Neurocomputing.

[56]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..