Deep Clustering with Incomplete Noisy Pairwise Annotations: A Geometric Regularization Approach

The recent integration of deep learning and pairwise similarity annotation-based constrained clustering -- i.e., $\textit{deep constrained clustering}$ (DCC) -- has proven effective for incorporating weak supervision into massive data clustering: Less than 1% of pair similarity annotations can often substantially enhance the clustering accuracy. However, beyond empirical successes, there is a lack of understanding of DCC. In addition, many DCC paradigms are sensitive to annotation noise, but performance-guaranteed noisy DCC methods have been largely elusive. This work first takes a deep look into a recently emerged logistic loss function of DCC, and characterizes its theoretical properties. Our result shows that the logistic DCC loss ensures the identifiability of data membership under reasonable conditions, which may shed light on its effectiveness in practice. Building upon this understanding, a new loss function based on geometric factor analysis is proposed to fend against noisy annotations. It is shown that even under $\textit{unknown}$ annotation confusions, the data membership can still be $\textit{provably}$ identified under our proposed learning criterion. The proposed approach is tested over multiple datasets to validate our claims.

[1]  Tri Nguyen,et al.  Deep Learning From Crowdsourced Labels: Coupled Cross-entropy Minimization, Identifiability, and Regularization , 2023, ICLR.

[2]  Xiao Fu,et al.  Memory-Efficient Convex Optimization for Self-Dictionary Separable Nonnegative Matrix Factorization: A Frank–Wolfe Approach , 2021, IEEE Transactions on Signal Processing.

[3]  Xiao Fu,et al.  Crowdsourcing via Annotator Co-occurrence Imputation and Provable Symmetric Nonnegative Matrix Factorization , 2021, ICML.

[4]  Julia E. Vogt,et al.  Deep Conditional Gaussian Mixture Model for Constrained Clustering , 2021, NeurIPS.

[5]  Masashi Sugiyama,et al.  Provably End-to-end Label-Noise Learning without Anchor Points , 2021, ICML.

[6]  Sugato Basu,et al.  A framework for deep constrained clustering , 2021, Data Mining and Knowledge Discovery.

[7]  Hongning Wang,et al.  Learning from Crowds by Modeling Common Confusions , 2020, AAAI.

[8]  Kejun Huang,et al.  Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms , 2019, NeurIPS.

[9]  Gang Niu,et al.  Are Anchor Points Really Indispensable in Label-Noise Learning? , 2019, NeurIPS.

[10]  Xiao Fu,et al.  Detecting Overlapping and Correlated Communities without Pure Nodes: Identifiability and Algorithm , 2019, ICML.

[11]  Ian Davidson,et al.  A Framework for Deep Constrained Clustering - Algorithms and Advances , 2019, ECML/PKDD.

[12]  Zenglin Xu,et al.  Semi-supervised deep embedded clustering , 2019, Neurocomputing.

[13]  Barnabás Póczos,et al.  Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[14]  Wing-Kin Ma,et al.  Nonnegative Matrix Factorization for Signal and Data Analytics: Identifiability, Algorithms, and Applications , 2018, IEEE Signal Processing Magazine.

[15]  Yun Fu,et al.  Partition Level Constrained Clustering , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[17]  Zsolt Kira,et al.  A probabilistic constrained clustering for transfer learning and image category discovery , 2018, ArXiv.

[18]  Francisco C. Pereira,et al.  Deep learning from crowds , 2017, AAAI.

[19]  Xiao Fu,et al.  On Identifiability of Nonnegative Matrix Factorization , 2017, IEEE Signal Processing Letters.

[20]  Nathan Srebro,et al.  SPECTRALLY-NORMALIZED MARGIN BOUNDS FOR NEURAL NETWORKS , 2018 .

[21]  Lingfeng Wang,et al.  Deep Adaptive Image Clustering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Jennifer G. Dy,et al.  Multiple Clustering Views from Multiple Uncertain Experts , 2017, ICML.

[23]  Maxim Panov,et al.  Consistent Estimation of Mixed Memberships with Successive Projections , 2017, COMPLEX NETWORKS.

[24]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[25]  Purnamrita Sarkar,et al.  On Mixed Memberships and Symmetric Nonnegative Matrix Factorizations , 2016, ICML.

[26]  Nikos D. Sidiropoulos,et al.  Anchor-Free Correlated Topic Modeling: Identifiability and Algorithm , 2016, NIPS.

[27]  Bo Yang,et al.  Robust Volume Minimization-Based Matrix Factorization for Remote Sensing and Document Clustering , 2016, IEEE Transactions on Signal Processing.

[28]  Shaogang Gong,et al.  Constrained Clustering With Imperfect Oracles , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Xi Chen,et al.  Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing , 2014, J. Mach. Learn. Res..

[30]  Gary L. Miller,et al.  Simple and Scalable Constrained Clustering: a Generalized Spectral Method , 2016, AISTATS.

[31]  Zsolt Kira,et al.  Neural network-based clustering using pairwise constraints , 2015, ArXiv.

[32]  Nikos D. Sidiropoulos,et al.  Blind Separation of Quasi-Stationary Sources: Exploiting Convex Geometry in Covariance Domain , 2015, IEEE Transactions on Signal Processing.

[33]  José M. Bioucas-Dias,et al.  Self-Dictionary Sparse Regression for Hyperspectral Unmixing: Greedy Pursuit and Pure Pixel Search Are Related , 2014, IEEE Journal of Selected Topics in Signal Processing.

[34]  Wei-Chiang Li,et al.  Identifiability of the Simplex Volume Minimization Criterion for Blind Hyperspectral Unmixing: The No-Pure-Pixel Case , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[35]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[36]  Nicolas Gillis,et al.  The Why and How of Nonnegative Matrix Factorization , 2014, ArXiv.

[37]  Nicolas Gillis,et al.  Robust near-separable nonnegative matrix factorization using linear optimization , 2013, J. Mach. Learn. Res..

[38]  Anima Anandkumar,et al.  A tensor approach to learning mixed membership community models , 2013, J. Mach. Learn. Res..

[39]  Nicolas Gillis,et al.  Fast and Robust Recursive Algorithmsfor Separable Nonnegative Matrix Factorization , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Nikos D. Sidiropoulos,et al.  Non-Negative Matrix Factorization Revisited: Uniqueness and Algorithm for Symmetric Decomposition , 2014, IEEE Transactions on Signal Processing.

[41]  Joydeep Ghosh,et al.  A study of K-Means-based algorithms for constrained clustering , 2013, Intell. Data Anal..

[42]  T. Tony Cai,et al.  Matrix completion via max-norm constrained optimization , 2013, ArXiv.

[43]  Sanjeev Arora,et al.  A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[44]  Vikas Sindhwani,et al.  Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization , 2012, ICML.

[45]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[46]  Ewout van den Berg,et al.  1-Bit Matrix Completion , 2012, ArXiv.

[47]  Ian Davidson,et al.  On constrained spectral clustering and its applications , 2012, Data Mining and Knowledge Discovery.

[48]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[49]  R. Preston McAfee,et al.  Who moderates the moderators?: crowdsourcing abuse detection in user-generated content , 2011, EC '11.

[50]  Jiawei Han,et al.  Locally Consistent Concept Factorization for Document Clustering , 2011, IEEE Transactions on Knowledge and Data Engineering.

[51]  Ana de Almeida,et al.  Nonnegative Matrix Factorization , 2018 .

[52]  S. S. Ravi,et al.  A SAT-based Framework for Efficient Constrained Clustering , 2010, SDM.

[53]  Xiaoou Tang,et al.  Constrained clustering via spectral regularization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Chong-Yung Chi,et al.  A Convex Analysis-Based Minimum-Volume Enclosing Simplex Algorithm for Hyperspectral Unmixing , 2009, IEEE Transactions on Signal Processing.

[55]  Brendan J. Frey,et al.  Semi-Supervised Affinity Propagation with Instance-Level Constraints , 2009, AISTATS.

[56]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[57]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[58]  Miguel Á. Carreira-Perpiñán,et al.  Constrained spectral clustering through affinity propagation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[60]  Inderjit S. Dhillon,et al.  Semi-supervised graph clustering: a kernel approach , 2005, Machine Learning.

[61]  Dan Pelleg,et al.  K -Means with Large and Noisy Constraint Sets , 2007, ECML.

[62]  Anil K. Jain,et al.  Model-based Clustering With Probabilistic Constraints , 2005, SDM.

[63]  Raymond J. Mooney,et al.  A probabilistic framework for semi-supervised clustering , 2004, KDD.

[64]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[65]  Arindam Banerjee,et al.  Active Semi-Supervision for Pairwise Constrained Clustering , 2004, SDM.

[66]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[67]  Victoria Stodden,et al.  When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? , 2003, NIPS.

[68]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[69]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[70]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[71]  Ka Yee Yeung,et al.  Details of the Adjusted Rand index and Clustering algorithms Supplement to the paper “ An empirical study on Principal Component Analysis for clustering gene expression data ” ( to appear in Bioinformatics ) , 2001 .

[72]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[73]  Maurice D. Craig,et al.  Minimum-volume transforms for remotely sensed data , 1994, IEEE Trans. Geosci. Remote. Sens..

[74]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[75]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[76]  H. Weyl Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differentialgleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung) , 1912 .