Spectral clustering algorithm based on Hadoop cloud platform research and implementation
暂无分享,去创建一个
Spectral clustering algorithm based on the theory of spectrum, its meaning is the optimal clustering problem into graph partitioning problem is a point of clustering algorithms can be high-dimensional data set cluster after dimensionality reduction. Greatly reducing the time of clustering. Compared with the traditional clustering algorithm, spectral clustering which can have the advantage of clustering and converge to the global optimal solution in the sample space of arbitrary shape. However, the prevalence of large data sets are in the real world, when we want to clustering the spectral of large data sets, because the data is too large, the convergence rate will slow down, if not impossible to obtain results within the stipulated time we give us a lot of problems cluster. Thus, this paper based on Hadoop cloud platform to achieve large-scale clustering high-dimensional data sets. Experiments show that: spectral clustering algorithm after the parallel deployments running on Hadoop clusters, with good speedup and good scalability.
[1] Muthu Dayalan,et al. MapReduce : Simplified Data Processing on Large Cluster , 2018 .
[2] Hakan Erdogmus,et al. Cloud Computing: Does Nirvana Hide behind the Nebula? , 2009, IEEE Softw..
[3] J. Naisbitt. Megatrends: Ten New Directions Transforming Our Lives , 1982 .
[4] Basilis Boutsinas,et al. On distributing the clustering process , 2002, Pattern Recognit. Lett..
[5] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.