论文信息 - Supervised Clustering with Structural SVMS

Supervised Clustering with Structural SVMS

Supervised clustering is the problem of training clustering methods to produce desirable clusterings. Given sets of items and complete clusterings over these sets, a supervised clustering algorithm learns how to cluster future sets of items in a similar fashion, typically by changing the underlying similarity measure between item pairs. This work presents a general approach for training clustering methods such as correlation clustering and k-means/spectral clustering able to optimize to task-specific performance criteria using structural SVMs. We empirically and theoretically analyze our supervised clustering approach on a variety of datasets and clustering methods. This analysis also leads to general insights about structural SVMs beyond supervised clustering. Specifically, since clustering is a NP-hard task and the corresponding training problem likewise must make use of approximate inference during training of the parameters, we present a detailed theoretical and empirical analysis of the general use of approximations in structural SVM training.

Thorsten Joachims | Thomas Finley | Thomas Finley

[1] Thorsten Joachims,et al. Support Vector Training of Protein Alignment Models , 2007, RECOMB.

[2] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3] Martial Hebert,et al. Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[4] Geoffrey E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[5] Surajit Ray,et al. A Nonparametric Statistical Approach to Clustering via Mode Identification , 2007, J. Mach. Learn. Res..

[6] Ivor W. Tsang,et al. Distance metric learning with kernels , 2003 .

[7] Philippe Rigollet,et al. Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption , 2006, J. Mach. Learn. Res..

[8] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[9] Mikhail Belkin,et al. Maximum Margin Semi-Supervised Learning for Structured Variables , 2005, NIPS 2005.

[10] Dan Roth,et al. The Use of Classifiers in Sequential Inference , 2001, NIPS.

[11] Thorsten Joachims,et al. Supervised k-Means Clustering , 2008 .

[12] M. Seeger. Learning with labeled and unlabeled dataMatthias , 2001 .

[13] Filip Radlinski,et al. A support vector method for optimizing average precision , 2007, SIGIR.

[14] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.

[15] Dan Roth,et al. Learning and Inference over Constrained Output , 2005, IJCAI.

[16] Ben Taskar,et al. Alignment by Agreement , 2006, NAACL.

[17] Vladimir Kolmogorov,et al. An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Andrew McCallum,et al. Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference , 2003, IIWeb.

[19] Claire Cardie,et al. Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[20] Nello Cristianini,et al. Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[21] I. Dhillon,et al. A Unified View of Kernel k-means , Spectral Clustering and Graph Cuts , 2004 .

[22] Hwee Tou Ng,et al. A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[23] Rahul Gupta,et al. Accurate max-margin training for structured output spaces , 2008, ICML '08.

[24] Thorsten Joachims,et al. Learning to Align Sequences: A Maximum-Margin Approach , 2006 .

[25] Gurmeet Singh,et al. MRF's forMRI's: Bayesian Reconstruction of MR Images via Graph Cuts , 2006, CVPR.

[26] Philip S. Yu,et al. On the merits of building categorization systems by supervised clustering , 1999, KDD '99.

[27] Dale Schuurmans,et al. Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling , 2006, ACL.

[28] Ben Taskar,et al. An End-to-End Discriminative Approach to Machine Translation , 2006, ACL.

[29] Marcel Worring,et al. The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[30] Fernando Pereira,et al. Structured Learning with Approximate Inference , 2007, NIPS.

[31] Thorsten Joachims,et al. Training structural svms with kernels using sampled cuts , 2008, KDD.

[32] Thomas Hofmann,et al. Hidden Markov Support Vector Machines , 2003, ICML.

[33] Raymond J. Mooney,et al. A probabilistic framework for semi-supervised clustering , 2004, KDD.

[34] Michael I. Jordan,et al. Learning Spectral Clustering , 2003, NIPS.

[35] Thorsten Joachims,et al. Supervised clustering with support vector machines , 2005, ICML.

[36] Jiebo Luo,et al. Learning multi-label scene classification , 2004, Pattern Recognit..

[37] Daniel Marcu,et al. Practical structured learning techniques for natural language processing , 2006 .

[38] Ben Taskar,et al. Word Alignment via Quadratic Assignment , 2006, NAACL.

[39] Bernhard Schölkopf,et al. Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[40] Nicu Sebe,et al. Semi-supervised learning for facial expression recognition , 2003, MIR '03.

[41] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[42] Nathan Srebro,et al. SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[43] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[44] Thorsten Joachims,et al. Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[45] Andrew McCallum,et al. Fast, Piecewise Training for Discriminative Finite-state and Parsing Models , 2005 .

[46] Toshihiro Kamishima,et al. Learning from Cluster Examples , 2003, Machine Learning.

[47] Ben Taskar,et al. Learning associative Markov networks , 2004, ICML.

[48] Dan Roth. Reasoning with Classifiers , 2002, PKDD.

[49] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50] Dan Roth,et al. Probabilistic Reasoning for Entity & Relation Recognition , 2002, COLING.

[51] Endre Boros,et al. Pseudo-Boolean optimization , 2002, Discret. Appl. Math..

[52] Ben Taskar,et al. Discriminative learning of Markov random fields for segmentation of 3D scan data , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[53] Martial Hebert,et al. Discriminative random fields: a discriminative framework for contextual interaction in classification , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[54] Thorsten Joachims,et al. Error bounds for correlation clustering , 2005, ICML.

[55] Mark W. Schmidt,et al. Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[56] Andrew McCallum,et al. Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[57] William W. Cohen,et al. Learning to Match and Cluster Entity Names , 2001 .

[58] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[59] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[60] Jason Weston,et al. A kernel method for multi-labelled classification , 2001, NIPS.

[61] Claire Cardie,et al. Noun Phrase Coreference as Clustering , 1999, EMNLP.

[62] William M. Rand,et al. Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[63] Vladimir Kolmogorov,et al. Minimizing Nonsubmodular Functions with Graph Cuts-A Review , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64] Inderjit S. Dhillon,et al. Iterative clustering of high dimensional text data augmented by local search , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[65] Claire Gardent,et al. Improving Machine Learning Approaches to Coreference Resolution , 2002, ACL.

[66] S. C. Johnson. Hierarchical clustering schemes , 1967, Psychometrika.

[67] James C. Bezdek,et al. Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[68] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[69] Daniel Marcu,et al. A Bayesian Model for Supervised Clustering with the Dirichlet Process Prior , 2005, J. Mach. Learn. Res..

[70] Dan Roth,et al. Integer linear programming inference for conditional random fields , 2005, ICML.

[71] Chaitanya Swamy,et al. Correlation Clustering: maximizing agreements via semidefinite programming , 2004, SODA '04.

[72] Jon M Kleinberg,et al. Hubs, authorities, and communities , 1999, CSUR.

[73] Inderjit S. Dhillon,et al. Semi-supervised graph clustering: a kernel approach , 2005, ICML '05.

[74] Bogdan Gabrys,et al. Combining labelled and unlabelled data in the design of pattern classification systems , 2004, Int. J. Approx. Reason..

[75] Adrian E. Raftery,et al. Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[76] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[77] Peter Haider,et al. Supervised clustering of streaming data for email batch detection , 2007, ICML '07.

[78] Claudio Gentile,et al. Hierarchical classification: combining Bayes with SVM , 2006, ICML.

[79] Miguel Á. Carreira-Perpiñán,et al. Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[80] Michael I. Jordan,et al. Learning Spectral Clustering, With Application To Speech Separation , 2006, J. Mach. Learn. Res..

[81] Dan Roth,et al. The Necessity of Syntactic Parsing for Semantic Role Labeling , 2005, IJCAI.

[82] Ben Taskar,et al. Max-Margin Markov Networks , 2003, NIPS.

[83] Jitendra Malik,et al. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[84] Raymond J. Mooney,et al. Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[85] Andrew McCallum,et al. First-Order Probabilistic Models for Coreference Resolution , 2007, NAACL.

[86] Nello Cristianini,et al. Efficiently Learning the Metric with Side-Information , 2003, ALT.

[87] Andrew McCallum,et al. Semi-Supervised Clustering with User Feedback , 2003 .

[88] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[89] Thorsten Joachims,et al. A support vector method for multivariate performance measures , 2005, ICML.

[90] Anthony Wirth,et al. Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[91] Ulf Brefeld,et al. Semi-supervised learning for structured output variables , 2006, ICML.