Integrating Tensor Similarity to Enhance Clustering Performance

The performance of most the clustering methods hinges on the used pairwise affinity, which is usually denoted by a similarity matrix. However, the pairwise similarity is notoriously known for its venerability of noise contamination or the imbalance in samples or features, and thus hinders accurate clustering. To tackle this issue, we propose to use information among samples to boost the clustering performance. We proved that a simplified similarity for pairs, denoted by a fourth order tensor, equals to the Kronecker product of pairwise similarity matrices under decomposable assumption, or provide complementary information for which the pairwise similarity missed under indecomposable assumption. Then a high order similarity matrix is obtained from the tensor similarity via eigenvalue decomposition. The high order similarity capturing spatial information serves as a robust complement for the pairwise similarity. It is further integrated with the popular pairwise similarity, named by IPS2, to boost the clustering performance. Extensive experiments demonstrated that the proposed IPS2 significantly outperformed previous similarity-based methods on real-world datasets and it was capable of handling the clustering task over under-sampled and noisy datasets.

[1]  Alan J. Laub,et al.  Matrix analysis - for scientists and engineers , 2004 .

[2]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[3]  Lei Du,et al.  Robust Multi-View Spectral Clustering via Low-Rank and Sparse Decomposition , 2014, AAAI.

[4]  Yun Fu,et al.  Image Cosegmentation via Saliency-Guided Constrained Clustering with Cosine Similarity , 2017, AAAI.

[5]  Zhongfei Zhang,et al.  Context-Aware Hypergraph Construction for Robust Spectral Clustering , 2014, 1401.0764.

[6]  Kathryn Roeder,et al.  Global spectral clustering in dynamic networks , 2018, Proceedings of the National Academy of Sciences.

[7]  Chih-Fong Tsai,et al.  Clustering-based undersampling in class-imbalanced data , 2017, Inf. Sci..

[8]  Soham Sarkar,et al.  On Perfect Clustering of High Dimension, Low Sample Size Data , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Frank McSherry,et al.  Spectral partitioning of random graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[11]  G. Karypis,et al.  Multilevel k-way hypergraph partitioning , 1999, Proceedings 1999 Design Automation Conference (Cat. No. 99CH36361).

[12]  Shie-Jue Lee,et al.  A Similarity Measure for Text Classification and Clustering , 2014, IEEE Transactions on Knowledge and Data Engineering.

[13]  Guangliang Chen,et al.  Spectral Curvature Clustering (SCC) , 2009, International Journal of Computer Vision.

[14]  Wuyi Wang,et al.  Strong Consistency of Spectral Clustering for Stochastic Block Models , 2017, IEEE Transactions on Information Theory.

[15]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[16]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations I. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Yihong Gong,et al.  Unsupervised Image Categorization by Hypergraph Partition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Ambedkar Dukkipati,et al.  Uniform Hypergraph Partitioning: Provable Tensor Methods and Sampling Techniques , 2016, J. Mach. Learn. Res..

[19]  Pietro Perona,et al.  Beyond pairwise clustering , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Thomas Brox,et al.  Higher order motion models and spectral clustering , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Tat-Jun Chin,et al.  Clustering with Hypergraphs: The Case for Large Hyperedges , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Bin Yu,et al.  Spectral clustering and the high-dimensional stochastic blockmodel , 2010, 1007.1684.

[23]  Miin-Shen Yang,et al.  A similarity-based robust clustering method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  René Vidal,et al.  Sparse Subspace Clustering: Algorithm, Theory, and Applications , 2012, IEEE transactions on pattern analysis and machine intelligence.

[25]  Alexander Mendiburu,et al.  Similarity Measure Selection for Clustering Time Series Databases , 2016, IEEE Transactions on Knowledge and Data Engineering.

[26]  Yan Yan,et al.  Searching for Representative Modes on Hypergraphs for Robust Geometric Model Fitting , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[28]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[29]  Venu Madhav Govindu,et al.  A tensor decomposition for geometric grouping and segmentation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[31]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[32]  A. Rinaldo,et al.  Consistency of spectral clustering in stochastic block models , 2013, 1312.2050.

[33]  Olgica Milenkovic,et al.  Inhomogeneous Hypergraph Clustering with Applications , 2017, NIPS.

[34]  Vipin Kumar,et al.  Hypergraph Based Clustering in High-Dimensional Data Sets: A Summary of Results , 1998, IEEE Data Eng. Bull..

[35]  Shuicheng Yan,et al.  Robust Clustering as Ensembles of Affinity Relations , 2010, NIPS.

[36]  Alexei Vazquez,et al.  Finding hypergraph communities: a Bayesian approach and variational solution , 2009 .

[37]  Zhengqin Li,et al.  Superpixel segmentation using Linear Spectral Clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Venu Madhav Govindu,et al.  Efficient Higher-Order Clustering on the Grassmann Manifold , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Serge J. Belongie,et al.  Higher order learning with graphs , 2006, ICML.

[41]  Hiroshi Mamitsuka,et al.  Learning on Hypergraphs with Sparsity , 2020, IEEE transactions on pattern analysis and machine intelligence.

[42]  Michael K. Ng,et al.  Automated variable weighting in k-means type clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Emmanuel Abbe,et al.  Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[44]  Xiuzhen Cheng,et al.  Developing Prognostic Systems of Cancer Patients by Ensemble Clustering , 2009, Journal of biomedicine & biotechnology.