Spectral clustering algorithm based on Hadoop cloud platform research and implementation

Spectral clustering algorithm based on the theory of spectrum, its meaning is the optimal clustering problem into graph partitioning problem is a point of clustering algorithms can be high-dimensional data set cluster after dimensionality reduction. Greatly reducing the time of clustering. Compared with the traditional clustering algorithm, spectral clustering which can have the advantage of clustering and converge to the global optimal solution in the sample space of arbitrary shape. However, the prevalence of large data sets are in the real world, when we want to clustering the spectral of large data sets, because the data is too large, the convergence rate will slow down, if not impossible to obtain results within the stipulated time we give us a lot of problems cluster. Thus, this paper based on Hadoop cloud platform to achieve large-scale clustering high-dimensional data sets. Experiments show that: spectral clustering algorithm after the parallel deployments running on Hadoop clusters, with good speedup and good scalability.