Learning Neural Models for End-to-End Clustering

We propose a novel end-to-end neural network architecture that, once trained, directly outputs a probabilistic clustering of a batch of input examples in one pass. It estimates a distribution over the number of clusters k, and for each \(1 \le k \le k_\mathrm {max}\), a distribution over the individual cluster assignment for each data point. The network is trained in advance in a supervised fashion on separate data to learn grouping by any perceptual similarity criterion based on pairwise labels (same/different group). It can then be applied to different data containing different groups. We demonstrate promising performance on high-dimensional data like images (COIL-100) and speech (TIMIT). We call this “learning to cluster” and show its conceptual difference to deep metric learning, semi-supervise clustering and other related approaches while having the advantage of performing learnable clustering fully end-to-end.

[1]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[2]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[3]  Fionn Murtagh,et al.  A Survey of Recent Advances in Hierarchical Clustering Algorithms , 1983, Comput. J..

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[6]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[7]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[8]  Daben Liu,et al.  Online speaker clustering , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[10]  Friedhelm Schwenker,et al.  Three learning phases for radial-basis-function networks , 2001, Neural Networks.

[11]  Ery Arias-Castro,et al.  Clustering Based on Pairwise Distances When the Data is of Mixed Dimensions , 2009, IEEE Transactions on Information Theory.

[12]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[13]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[14]  Derek Greene,et al.  Normalized Mutual Information to evaluate overlapping community finding algorithms , 2011, ArXiv.

[15]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[17]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Tengyu Ma,et al.  On the Ability of Neural Nets to Express Distributions , 2017, COLT.

[20]  Stefanie Jegelka,et al.  Deep Metric Learning via Facility Location , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[22]  Kuo-Chin Fan,et al.  A Novel Spectral Clustering Method Based on Pairwise Distance Matrix , 2010, J. Inf. Sci. Eng..

[23]  Jian Wang,et al.  Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Xiang Yu,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2016 .

[25]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[26]  Stefan Winkler,et al.  A data-driven approach to cleaning large face datasets , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[27]  Michael Kampffmeyer,et al.  Deep divergence-based clustering , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[28]  Daniel Cremers,et al.  Clustering with Deep Learning: Taxonomy and New Methods , 2018, ArXiv.

[29]  Simon Dixon,et al.  An End-to-End Neural Network for Polyphonic Piano Music Transcription , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[30]  Zsolt Kira,et al.  Learning to cluster in order to Transfer across domains and tasks , 2017, ICLR.

[31]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[32]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[34]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  Oliver Durr,et al.  Speaker identification and clustering using convolutional neural networks , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[36]  Xin Jin,et al.  Expectation Maximization Clustering , 2010, Encyclopedia of Machine Learning.

[37]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[38]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Pietro Perona,et al.  The Ignorant Led by the Blind: A Hybrid Human–Machine Vision System for Fine-Grained Categorization , 2014, International Journal of Computer Vision.

[40]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning , 2011, Encyclopedia of Machine Learning.

[42]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[44]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[45]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Oliver Durr,et al.  Learning embeddings for speaker clustering based on voice equality , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).