Sampling with Minimum Sum of Squared Similarities for Nystrom-Based Large Scale Spectral Clustering

The Nystrom sampling provides an efficient approach for large scale clustering problems, by generating a low-rank matrix approximation. However, existing sampling methods are limited by their accuracies and computing times. This paper proposes a scalable Nystrom-based clustering algorithm with a new sampling procedure, Minimum Sum of Squared Similarities (MSSS). Here we provide a theoretical analysis of the upper error bound of our algorithm, and demonstrate its performance in comparison to the leading spectral clustering methods that use Nystrom sampling.

[1]  Minoru Sasaki,et al.  Spectral Clustering for a Large Data Set by Reducing the Similarity Matrix Size , 2008, LREC.

[2]  Guillermo Ricardo Simari,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[3]  Herna L. Viktor,et al.  Spectral Clustering: An Explorative Study of Proximity Measures , 2011, IC3K.

[4]  Ivor W. Tsang,et al.  Improved Nyström low-rank approximation and error analysis , 2008, ICML '08.

[5]  Ian H. Sloan,et al.  Quadrature methods for integral equations of the second kind over infinite intervals , 1981 .

[6]  N. Stanietsky,et al.  The interaction of TIGIT with PVR and PVRL2 inhibits human NK cell cytotoxicity , 2009, Proceedings of the National Academy of Sciences.

[7]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[8]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Mohamed-Ali Belabbas,et al.  Spectral methods in machine learning and new strategies for very large datasets , 2009, Proceedings of the National Academy of Sciences.

[10]  Petros Drineas,et al.  FAST MONTE CARLO ALGORITHMS FOR MATRICES II: COMPUTING A LOW-RANK APPROXIMATION TO A MATRIX∗ , 2004 .

[11]  Ameet Talwalkar,et al.  On sampling-based approximate spectral decomposition , 2009, ICML '09.

[12]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[13]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[14]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[15]  Yike Guo,et al.  Parallel Clustering Algorithm for Large-Scale Biological Data Sets , 2014, PloS one.

[16]  C. Caldwell Mathematics of Computation , 1999 .

[17]  Quanzeng You,et al.  Clusterability Analysis and Incremental Sampling for Nyström Extension Based Spectral Clustering , 2011, 2011 IEEE 11th International Conference on Data Mining.

[18]  Ye Tian,et al.  A Fast Incremental Spectral Clustering for Large Data Sets , 2011, 2011 12th International Conference on Parallel and Distributed Computing, Applications and Technologies.

[19]  Ameet Talwalkar,et al.  Ensemble Nystrom Method , 2009, NIPS.

[20]  Wen Lea Pearn,et al.  (Journal of Computational and Applied Mathematics,228(1):274-278)Optimization of the T Policy M/G/1 Queue with Server Breakdowns and General Startup Times , 2009 .

[21]  John Eccleston,et al.  Statistics and Computing , 2006 .

[22]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[23]  Richard Peng,et al.  Uniform Sampling for Matrix Approximation , 2014, ITCS.

[24]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[25]  Edward Y. Chang,et al.  Parallel Spectral Clustering in Distributed Systems , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Zoubin Ghahramani,et al.  Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.

[27]  R. Sznajder,et al.  Schur complements, Schur determinantal and Haynsworth inertia formulas in Euclidean Jordan algebras , 2010 .

[28]  Ming Zhu,et al.  Minimum Similarity Sampling Scheme for Nyström Based Spectral Clustering on Large Scale High-Dimensional Data , 2014, IEA/AIE.

[29]  Zhouyu Fu,et al.  Optimal Landmark Selection for Nyström Approximation , 2014, ICONIP.