Large-Margin Metric Learning for Constrained Partitioning Problems

We consider unsupervised partitioning problems based explicitly or implicitly on the minimization of Euclidean distortions, such as clustering, image or video segmentation, and other change-point detection problems. We emphasize on cases with specific structure, which include many practical situations ranging from mean-based change-point detection to image segmentation problems. We aim at learning a Mahalanobis metric for these unsupervised problems, leading to feature weighting and/or selection. This is done in a supervised way by assuming the availability of several (partially) labeled datasets that share the same metric. We cast the metric learning problem as a large-margin structured prediction problem, with proper definition of regularizers and losses, leading to a convex optimization problem which can be solved efficiently. Our experiments show how learning the metric can significantly improve performance on bioinformatics, video or image segmentation problems.

[1]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[2]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[3]  Michael I. Jordan,et al.  Learning Spectral Clustering , 2003, NIPS.

[4]  Thorsten Joachims,et al.  Supervised clustering with support vector machines , 2005, ICML.

[5]  Andrew Zisserman,et al.  A Visual Vocabulary for Flower Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[7]  Max Welling,et al.  Robust Higher Order Statistics , 2005, AISTATS.

[8]  H. Akaike A new look at the statistical model identification , 1974 .

[9]  Alexander J. Smola,et al.  Learning Graph Matching , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[11]  Shimon Ullman,et al.  Learning to Segment , 2004, ECCV.

[12]  Thorsten Joachims,et al.  Supervised k-Means Clustering , 2008 .

[13]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[14]  Manuel Davy,et al.  An online kernel change detection algorithm , 2005, IEEE Transactions on Signal Processing.

[15]  Marc Lavielle,et al.  Using penalized contrasts for the change-point problem , 2005, Signal Process..

[16]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[17]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[19]  Michael I. Jordan,et al.  Revisiting k-means: New Algorithms via Bayesian Nonparametrics , 2011, ICML.

[20]  Inderjit S. Dhillon,et al.  Metric and Kernel Learning Using a Linear Transformation , 2009, J. Mach. Learn. Res..

[21]  Sebastian Nowozin,et al.  Task-Specific Image Partitioning , 2013, IEEE Transactions on Image Processing.

[22]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[23]  V. Liebscher,et al.  Consistencies and rates of convergence of jump-penalized least squares estimators , 2009, 0902.4838.

[24]  Derek Hoiem,et al.  Learning CRFs Using Graph Cuts , 2008, ECCV.

[25]  Francis R. Bach,et al.  Learning smoothing models of copy number profiles using breakpoint annotations , 2013, BMC Bioinformatics.

[26]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[27]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[28]  P. Fearnhead,et al.  Optimal detection of changepoints with a linear computational cost , 2011, 1101.1438.

[29]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[30]  Guillem Rigaill,et al.  Pruned dynamic programming for optimal multiple change-point detection , 2010 .

[31]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[33]  Takeo Kanade,et al.  Discriminative cluster analysis , 2006, ICML.

[34]  Martial Hebert,et al.  Toward Objective Evaluation of Image Segmentation Algorithms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  M. Cugmas,et al.  On comparing partitions , 2015 .

[36]  Gert R. G. Lanckriet,et al.  Metric Learning to Rank , 2010, ICML.

[37]  Gaël Richard,et al.  On the Correlation of Automatic Audio and Visual Segmentations of Music Videos , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  Arjun K. Gupta,et al.  Parametric Statistical Change Point Analysis , 2000 .

[39]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[40]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[41]  Tomer Hertz,et al.  Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[42]  B. E. Brodsky,et al.  Non-Parametric Statistical Diagnosis: Problems and Methods , 2000 .

[43]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..