Research on Improved k-Means Clustering Algorithm Based on Hadoop Platform
暂无分享,去创建一个
In this paper, aiming at the problems of traditional K-means clustering algorithm in big data processing, such as performance and determination of initial clustering center, an improved k-means clustering algorithm based on Hadoop platform is proposed. This algorithm uses canopy algorithm and cosine similarity to calculate, optimizes the determination of initial clustering center by K-means algorithm, and uses parallel computing framework to expand the algorithm in parallel. To adapt to big data processing. The experimental results show that the improved k-means clustering algorithm based on Hadoop platform has better clustering effect, and also has good speedup and scalability when processing a large number of data.
[1] Anil K. Jain. Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..
[2] Anil K. Jain. Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..