论文信息 - Modified Multidimensional Scaling and High Dimensional Clustering

Modified Multidimensional Scaling and High Dimensional Clustering

Multidimensional scaling is an important dimension reduction tool in statistics and machine learning. Yet few theoretical results characterizing its statistical performance exist, not to mention any in high dimensions. By considering a unified framework that includes low, moderate and high dimensions, we study multidimensional scaling in the setting of clustering noisy data. Our results suggest that, the classical multidimensional scaling can be modified to further improve the quality of embedded samples, especially when the noise level increases. To this end, we propose {\it modified multidimensional scaling} which applies a nonlinear transformation to the sample eigenvalues. The nonlinear transformation depends on the dimensionality, sample size and moment of noise. We show that modified multidimensional scaling followed by various clustering algorithms can achieve exact recovery, i.e., all the cluster labels can be recovered correctly with probability tending to one. Numerical simulations and two real data applications lend strong support to our proposed methodology.

Qiang Sun | Xiucai Ding

[1] M. Fiedler. Bounds for the determinant of the sum of hermitian matrices , 1971 .

[2] Wang Zhou,et al. Universality for the largest eigenvalue of sample covariance matrices with general population , 2013, 1304.5690.

[3] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[4] Hans-Peter Kriegel,et al. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[5] Damien Passemier,et al. On determining the number of spikes in a high-dimensional spiked population model , 2011, 1104.2677.

[6] H. Yau,et al. On the principal components of sample covariance matrices , 2014, 1404.0788.

[7] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[8] Emmanuel Abbe,et al. Community detection and stochastic block models: recent developments , 2017, Found. Trends Commun. Inf. Theory.

[9] V. Marčenko,et al. DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[10] Noureddine El Karoui. Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices , 2005, math/0503109.

[11] Jianfeng Yao,et al. Estimation of the number of spikes, possibly equal, in the high-dimensional case , 2011, J. Multivar. Anal..