Semi-Supervised Affinity Propagation with Soft Instance-Level Constraints

Soft-constraint semi-supervised affinity propagation (SCSSAP) adds supervision to the affinity propagation (AP) clustering algorithm without strictly enforcing instance-level constraints. Constraint violations lead to an adjustment of the AP similarity matrix at every iteration of the proposed algorithm and to addition of a penalty to the objective function. This formulation is particularly advantageous in the presence of noisy labels or noisy constraints since the penalty parameter of SCSSAP can be tuned to express our confidence in instance-level constraints. When the constraints are noiseless, SCSSAP outperforms unsupervised AP and performs at least as well as the previously proposed semi-supervised AP and constrained expectation maximization. In the presence of label and constraint noise, SCSSAP results in a more accurate clustering than either of the aforementioned established algorithms. Finally, we present an extension of SCSSAP which incorporates metric learning in the optimization objective and can further improve the performance of clustering.

[1]  Peng Liu,et al.  Semi-supervised sparse metric learning using alternating linearization optimization , 2010, KDD.

[2]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Max A. Little,et al.  Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection , 2007 .

[4]  Chiou-Shann Fuh,et al.  Clustering Complex Data with Group-Dependent Feature Selection , 2010, ECCV.

[5]  Wenye Li Clustering with Uncertainties: An Affinity Propagation-Based Approach , 2012, ICONIP.

[6]  Tomer Hertz,et al.  Computing Gaussian Mixture Models with EM Using Equivalence Constraints , 2003, NIPS.

[7]  Krishna Kummamuru,et al.  Semisupervised Clustering with Metric Learning using Relative Comparisons , 2008, IEEE Trans. Knowl. Data Eng..

[8]  Tat-Seng Chua,et al.  An efficient sparse metric learning in high-dimensional space via l1-penalized log-determinant regularization , 2009, ICML '09.

[9]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[10]  Charles Bouveyron,et al.  Robust supervised classification with mixture models: Learning from data with uncertain labels , 2009, Pattern Recognit..

[11]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[12]  Wei Pan,et al.  Simultaneous supervised clustering and feature selection over a graph. , 2012, Biometrika.

[13]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[14]  Lorenzo Bruzzone,et al.  Incremental and Decremental Affinity Propagation for Semisupervised Clustering in Multispectral Images , 2013, IEEE Transactions on Geoscience and Remote Sensing.

[15]  Brendan J. Frey,et al.  A Binary Variable Model for Affinity Propagation , 2009, Neural Computation.

[16]  Yanchun Liang,et al.  An incremental affinity propagation algorithm and its applications for text clustering , 2009, 2009 International Joint Conference on Neural Networks.

[17]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[18]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[19]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[20]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[21]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[22]  Dimitrios Gunopulos,et al.  Locally adaptive metrics for clustering high dimensional data , 2007, Data Mining and Knowledge Discovery.

[23]  Yi Liu,et al.  An Efficient Algorithm for Local Distance Metric Learning , 2006, AAAI.

[24]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[25]  Inderjit S. Dhillon,et al.  Metric and Kernel Learning Using a Linear Transformation , 2009, J. Mach. Learn. Res..

[26]  Wei Liu,et al.  Semi-supervised distance metric learning for Collaborative Image Retrieval , 2008, CVPR.

[27]  Wei Liu,et al.  Learning Distance Metrics with Contextual Constraints for Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[28]  Tomer Hertz,et al.  Learning Distance Functions using Equivalence Relations , 2003, ICML.

[29]  Claire Cardie,et al.  Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[30]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[31]  Peter J. Ramadge,et al.  A supervisory approach to semi-supervised clustering , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Michele Leone,et al.  Clustering by Soft-constraint Affinity Propagation: Applications to Gene-expression Data , 2022 .

[33]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[34]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[35]  Jieping Ye,et al.  Adaptive Distance Metric Learning for Clustering , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Yong Zhou,et al.  Semisupervised Clustering for Networks Based on Fast Affinity Propagation , 2013 .

[37]  M. Weigt,et al.  Unsupervised and semi-supervised clustering by message passing: soft-constraint affinity propagation , 2007, 0712.1165.

[38]  Mykola Pechenizkiy,et al.  Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[39]  Brendan J. Frey,et al.  Hierarchical Affinity Propagation , 2011, UAI.

[40]  Shih-Fu Chang,et al.  To search or to label?: predicting the performance of search-based automatic image classifiers , 2006, MIR '06.

[41]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[42]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[43]  Rong Jin,et al.  Bayesian Active Distance Metric Learning , 2007, UAI.

[44]  Brendan J. Frey,et al.  Semi-Supervised Affinity Propagation with Instance-Level Constraints , 2009, AISTATS.

[45]  Chang-Dong Wang,et al.  Multi-Exemplar Affinity Propagation , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Tianrui Li,et al.  Constraint projections for semi-supervised affinity propagation , 2012, Knowl. Based Syst..

[47]  Wei Liu,et al.  Robust and Scalable Graph-Based Semisupervised Learning , 2012, Proceedings of the IEEE.

[48]  Feiping Nie,et al.  Learning a Mahalanobis distance metric for data clustering and classification , 2008, Pattern Recognit..