Learning A Task-Specific Deep Architecture For Clustering

While sparse coding-based clustering methods have shown to be successful, their bottlenecks in both efficiency and scalability limit the practical usage. In recent years, deep learning has been proved to be a highly effective, efficient and scalable feature learning tool. In this paper, we propose to emulate the sparse coding-based clustering pipeline in the context of deep learning, leading to a carefully crafted deep model benefiting from both. A feed-forward network structure, named TAGnet, is constructed based on a graph-regularized sparse coding algorithm. It is then trained with task-specific loss functions from end to end. We discover that connecting deep learning to sparse coding benefits not only the model performance, but also its initialization and interpretation. Moreover, by introducing auxiliary clustering tasks to the intermediate feature hierarchy, we formulate DTAGnet and obtain a further performance boost. Extensive experiments demonstrate that the proposed model gains remarkable margins over several state-of-the-art methods.

[1]  Shuicheng Yan,et al.  Learning With $\ell ^{1}$-Graph for Image Analysis , 2010, IEEE Transactions on Image Processing.

[2]  Qing Ling,et al.  Multi-Task Learning for Subspace Segmentation , 2015, ICML.

[3]  Chun Chen,et al.  Graph Regularized Sparse Coding for Image Representation , 2011, IEEE Transactions on Image Processing.

[4]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[5]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[6]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[7]  Charu C. Aggarwal,et al.  Heterogeneous Network Embedding via Deep Architectures , 2015, KDD.

[8]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[10]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12]  Gang Chen,et al.  Deep Learning with Nonparametric Clustering , 2015, ArXiv.

[13]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[14]  Simon Fong,et al.  A Joint Optimization Framework of Sparse Coding and Discriminative Clustering , 2015, IJCAI.

[15]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[16]  George Trigeorgis,et al.  A Deep Semi-NMF Model for Learning Hidden Representations , 2014, ICML.

[17]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[18]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[21]  Volker Roth,et al.  Feature Selection in Clustering Problems , 2003, NIPS.

[22]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[23]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[24]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[25]  Fei Wang,et al.  Efficient Maximum Margin Clustering via Cutting Plane Algorithm , 2008, SDM.

[26]  Jiangping Wang,et al.  Data Clustering by Laplacian Regularized L1-Graph , 2014, AAAI.

[27]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Tao Jiang,et al.  Minimum entropy clustering and applications to gene expression analysis , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[29]  Ivor W. Tsang,et al.  Dynamic vehicle routing with stochastic requests , 2003, IJCAI 2003.

[30]  Enhong Chen,et al.  Learning Deep Representations for Graph Clustering , 2014, AAAI.

[31]  Takeo Kanade,et al.  Multi-PIE , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[32]  BiernackiChristophe,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000 .

[33]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[34]  Tao Li,et al.  The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[35]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[37]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[38]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[39]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.