Deep Kernel Learning for Clustering

We propose a deep learning approach for discovering kernels tailored to identifying clusters over sample data. Our neural network produces sample embeddings that are motivated by--and are at least as expressive as--spectral clustering. Our training objective, based on the Hilbert Schmidt Information Criterion, can be optimized via gradient adaptations on the Stiefel manifold, leading to significant acceleration over spectral methods relying on eigendecompositions. Finally, our trained embedding can be directly applied to out-of-sample data. We show experimentally that our approach outperforms several state-of-the-art deep clustering methods, as well as traditional approaches such as $k$-means and spectral clustering over a broad array of real-life and synthetic datasets.

[1]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[2]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[3]  Michael I. Jordan,et al.  Dimensionality Reduction for Spectral Clustering , 2011, AISTATS.

[4]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[5]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Cameron Musco,et al.  Recursive Sampling for the Nystrom Method , 2016, NIPS.

[7]  En Zhu,et al.  Deep Clustering with Convolutional Autoencoders , 2017, ICONIP.

[8]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[9]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[10]  Andrew Gordon Wilson,et al.  Gaussian Process Regression Networks , 2011, ICML.

[11]  Wotao Yin,et al.  A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[12]  Miguel Á. Carreira-Perpiñán,et al.  The Variational Nystrom method for large-scale spectral problems , 2016, ICML.

[13]  James Bailey,et al.  Generation of Alternative Clusterings Using the CAMI Approach , 2010, SDM.

[14]  Le Song,et al.  Supervised feature selection via dependence estimation , 2007, ICML '07.

[15]  Tong Zhang,et al.  Deep Subspace Clustering Networks , 2017, NIPS.

[16]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[17]  Jennifer G. Dy,et al.  Nonparametric Mixture of Gaussian Processes with Constraints , 2013, ICML.

[18]  Feng Liu,et al.  Auto-encoder Based Data Clustering , 2013, CIARP.

[19]  Yue Zhao,et al.  Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup , 2015, ICML.

[20]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[21]  Ronen Basri,et al.  SpectralNet: Spectral Clustering using Deep Neural Networks , 2018, ICLR.

[22]  Enhong Chen,et al.  Learning Deep Representations for Graph Clustering , 2014, AAAI.

[23]  Masashi Sugiyama,et al.  Learning Discrete Representations via Information Maximizing Self-Augmented Training , 2017, ICML.

[24]  I. Jolliffe Principal Component Analysis and Factor Analysis , 1986 .

[25]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[26]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Stephen D. Bay,et al.  The UCI KDD archive of large data sets for data mining research and experimentation , 2000, SKDD.

[29]  Xiangyu Li,et al.  Iterative Spectral Method for Alternative Clustering , 2018, AISTATS.

[30]  René Vidal,et al.  Subspace Clustering , 2011, IEEE Signal Processing Magazine.

[31]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.