Semi-crowdsourced Clustering with Deep Generative Models

We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework. Empirical results on synthetic and real-world datasets show that our model outperforms previous crowdsourced clustering methods.

[1]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[2]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[3]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[4]  Jennifer G. Dy,et al.  Multiple Clustering Views from Multiple Uncertain Experts , 2017, ICML.

[5]  Babak Hassibi,et al.  Crowdsourced Clustering: Querying Edges vs Triangles , 2016, NIPS.

[6]  Tian Tian,et al.  Max-Margin Majority Voting for Learning from Crowds , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[8]  Jun Zhu,et al.  ZhuSuan: A Library for Bayesian Deep Learning , 2017, ArXiv.

[9]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[10]  Michael J. Freedman,et al.  Scalable Inference of Overlapping Communities , 2012, NIPS.

[11]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[12]  Jan Baumbach,et al.  Comparing the performance of biomedical clustering methods , 2015, Nature Methods.

[13]  Mohammad Emtiyaz Khan,et al.  Variational Message Passing with Structured Inference Networks , 2018, ICLR.

[14]  Jinfeng Yi,et al.  Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning , 2012, NIPS.

[15]  Jun Zhu,et al.  Conditional Generative Moment-Matching Networks , 2016, NIPS.

[16]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[17]  Bo Zhang,et al.  Smooth Neighbors on Teacher Graphs for Semi-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[19]  Daan Wierstra,et al.  One-Shot Generalization in Deep Generative Models , 2016, ICML.

[20]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[24]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[25]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[26]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[27]  Qiang Liu,et al.  Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy , 2014, ICML.

[28]  Pietro Perona,et al.  Crowdclustering , 2011, NIPS.

[29]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[30]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.