Studying Constrained Clustering Problems Using Homotopy Maps

Many algorithms for constrained clustering have been developed in the literature that aim to balance vector quantization requirements of cluster prototypes against the discrete satisfaction requirements of constraint (must-link or must-not-link) sets. Significant research has been devoted to designing new algorithms for constrained clustering and understanding when constraints help clustering. However, no method exists to systematically characterize solution spaces as constraints are gently introduced and how to assist practitioners in choosing a sweet spot between vector quantization and constraint satisfaction. We present a homotopy method that can smoothly track solutions from unconstrained to constrained formulations of clustering. Beginning the homotopy zero curve tracking where the solution is (fairly) well-understood, the curve can then be tracked into regions where there is only a qualitative understanding of the solution space, finding multiple local minima along the way. Through experiments, we demonstrate how our homotopy method helps identify better tradeoffs and reveals insight into the structure of solution spaces not obtainable using pointwise exploration of parameters.

[1]  J. Yorke,et al.  Finding zeroes of maps: homotopy methods that are constructive with probability one , 1978 .

[2]  Marina Meila,et al.  Comparing clusterings: an axiomatic view , 2005, ICML.

[3]  Grace Hui Yang,et al.  A Metric-based Framework for Automatic Taxonomy Induction , 2009, ACL.

[4]  Ping He,et al.  Constrained Clustering with Local Constraint Propagation , 2012, ECCV Workshops.

[5]  Layne T. Watson Probability-one homotopies in computational science , 2002 .

[6]  Layne T. Watson,et al.  Theory of Globally Convergent Probability-One Homotopies for Nonlinear Programming , 2000, SIAM J. Optim..

[7]  L. Watson A globally convergent algorithm for computing fixed points of C2 maps , 1979 .

[8]  Yusuke Sato,et al.  Interactive constrained clustering for patent document set , 2009, PaIR@CIKM.

[9]  Ian Davidson,et al.  Flexible constrained spectral clustering , 2010, KDD.

[10]  Adrian Corduneanu,et al.  Continuation Methods for Mixing Heterogenous Sources , 2002, UAI.

[11]  Thomas Hofmann,et al.  Non-redundant data clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[12]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[13]  Layne T. Watson,et al.  Algorithm 652: HOMPACK: a suite of codes for globally convergent homotopy algorithms , 1987, TOMS.

[14]  Ian Davidson,et al.  Two approaches to understanding when constraints help clustering , 2012, KDD.

[15]  O. Mangasarian Equivalence of the Complementarity Problem to a System of Nonlinear Equations , 1976 .

[16]  M. Shahriar Hossain,et al.  How to “alternatize” a clustering algorithm , 2013, Data Mining and Knowledge Discovery.

[17]  Peter Stone,et al.  Autonomous transfer for reinforcement learning , 2008, AAMAS.

[18]  M. Shahriar Hossain,et al.  Unifying dependent clustering and disparate clustering for non-homogeneous data , 2010, KDD.

[19]  L. Watson Solving the Nonlinear Complementarity Problem by a Homotopy Method , 1979 .

[20]  Mikhail Belkin,et al.  The Value of Labeled and Unlabeled Examples when the Model is Imperfect , 2007, NIPS.

[21]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[22]  Lawrence Carin,et al.  Semisupervised Learning of Hidden Markov Models via a Homotopy Method , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Mahdieh Soleymani Baghshah,et al.  Learning low-rank kernel matrices for constrained clustering , 2011, Neurocomputing.

[24]  Maria-Florina Balcan,et al.  A discriminative model for semi-supervised learning , 2010, J. ACM.

[25]  Hui Xiong,et al.  Transfer learning from multiple source domains via consensus regularization , 2008, CIKM '08.

[26]  Qiang Yang,et al.  Heterogeneous Transfer Learning for Image Clustering via the SocialWeb , 2009, ACL.

[27]  Dan Zhang,et al.  Multi-view transfer learning with a large margin approach , 2011, KDD.

[28]  Xiaojin Zhu,et al.  Semi-Supervised Learning , 2010, Encyclopedia of Machine Learning.

[29]  Shinichi Morishita,et al.  Constrained clusters of gene expression profiles with pathological features , 2004, Bioinform..

[30]  Masha Sosonkina,et al.  Algorithm 777: HOMPACK90: a suite of Fortran 90 codes for globally convergent homotopy algorithms , 1997, TOMS.

[31]  Ming-Syan Chen,et al.  Constrained data clustering by depth control and progressive constraint relaxation , 2005, The VLDB Journal.

[32]  L. Watson Numerical linear algebra aspects of globally convergent homotopy methods , 1986 .