Attentive Clustering Processes

Amortized approaches to clustering have recently received renewed attention thanks to novel objective functions that exploit the expressiveness of deep learning models. In this work we revisit a recent proposal for fast amortized probabilistic clustering, the Clusterwise Clustering Process (CCP), which yields samples from the posterior distribution of cluster labels for sets of arbitrary size using only O(K) forward network evaluations, where K is an arbitrary number of clusters. While adequate in simple datasets, we show that the model can severely underfit complex datasets, and hypothesize that this limitation can be traced back to the implicit assumption that the probability of a point joining a cluster is equally sensitive to all the points available to join the same cluster. We propose an improved model, the Attentive Clustering Process (ACP), that selectively pays more attention to relevant points while preserving the invariance properties of the generative model. We illustrate the advantages of the new model in applications to spike-sorting in multi-electrode arrays and community discovery in networks. The latter case combines the ACP model with graph convolutional networks, and to our knowledge is the first deep learning model that handles an arbitrary number of communities.

[1]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[2]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[3]  Morten Mørup,et al.  Nonparametric Bayesian modeling of complex networks: an introduction , 2013, IEEE Signal Processing Magazine.

[4]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[5]  Xavier Bresson,et al.  Residual Gated Graph ConvNets , 2017, ArXiv.

[6]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[7]  Emmanuel Abbe,et al.  Exact Recovery in the Stochastic Block Model , 2014, IEEE Transactions on Information Theory.

[8]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[9]  Liam Paninski,et al.  Neural Clustering Processes , 2020, ICML.

[10]  Frank D. Wood,et al.  Inference Compilation and Universal Probabilistic Programming , 2016, AISTATS.

[11]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[12]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Adam R. Kosiorek,et al.  Conditional Set Generation with Transformers , 2020, ArXiv.

[15]  Thorsten Joachims,et al.  Supervised clustering with support vector machines , 2005, ICML.

[16]  William L. Hamilton Graph Representation Learning , 2020, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[17]  Dino Ienco,et al.  Deep Multivariate Time Series Embedding Clustering via Attentive-Gated Autoencoder , 2020, PAKDD.

[18]  Jeffrey W. Miller,et al.  Mixture Models With a Prior on the Number of Components , 2015, Journal of the American Statistical Association.

[19]  Jonathon S. Hare,et al.  Deep Set Prediction Networks , 2019, NeurIPS.

[20]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[21]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[22]  Daniel Cremers,et al.  Clustering with Deep Learning: Taxonomy and New Methods , 2018, ArXiv.

[23]  Emmanuel Abbe,et al.  Recovering Communities in the General Stochastic Block Model Without Knowing the Parameters , 2015, NIPS.

[24]  B. Øksendal Stochastic differential equations : an introduction with applications , 1987 .

[25]  B. De Finetti,et al.  Funzione caratteristica di un fenomeno aleatorio , 1929 .

[26]  Shivam Kalra,et al.  Learning Permutation Invariant Representations using Memory Networks , 2020, ECCV.

[27]  Emmanuel Abbe,et al.  Community Detection in General Stochastic Block models: Fundamental Limits and Efficient Algorithms for Recovery , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[28]  Philip S. Yu,et al.  Algorithms for Estimating the Partition Function of Restricted Boltzmann Machines (Extended Abstract) , 2020 .

[29]  Hans-Peter Kriegel,et al.  Learning Infinite Hidden Relational Models , 2006 .

[30]  Joan Bruna,et al.  Community Detection with Graph Neural Networks , 2017, 1705.08415.

[31]  Chithrupa Ramesh,et al.  Attention-Based Clustering: Learning a Kernel from Context , 2020, ArXiv.

[32]  Matthias Christandl,et al.  Finite de Finetti theorem for conditional probability distributions describing physical theories , 2007, 0712.0916.

[33]  D. Freedman,et al.  Finite Exchangeable Sequences , 1980 .

[34]  L. J. Savage,et al.  Symmetric measures on Cartesian products , 1955 .

[35]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[36]  Ke-Lin Du,et al.  Clustering: A neural network approach , 2010, Neural Networks.

[37]  Emmanuel Abbe,et al.  Community Detection and Stochastic Block Models , 2017, Found. Trends Commun. Inf. Theory.

[38]  Yee Whye Teh,et al.  Deep Amortized Clustering , 2019, ArXiv.

[39]  Till Becker,et al.  Stochastic block models: A comparison of variants and inference methods , 2019, PloS one.

[40]  Liam Paninski,et al.  Spike Sorting using the Neural Clustering Process , 2019 .

[41]  P. Diaconis Finite forms of de Finetti's theorem on exchangeability , 1977, Synthese.

[42]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[43]  Victor J. Rayward-Smith,et al.  Adapting k-means for supervised clustering , 2006, Applied Intelligence.

[44]  Qiang Liu,et al.  A Survey of Clustering With Deep Learning: From the Perspective of Network Architecture , 2018, IEEE Access.

[45]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[46]  Yoshua Bengio,et al.  Benchmarking Graph Neural Networks , 2023, J. Mach. Learn. Res..