论文信息 - Research on Clustering Algorithm for Large Data Sets

Research on Clustering Algorithm for Large Data Sets

This paper proposes DBk-means algorithm aiming at the clustering problem for large data sets.By using Hadoop to preprocess the large original log data,the algorithm combines the superiority of k-means algorithm and DBSCAN algorithm.The experimental results of DBk-means algorithm show that this algorithm could achieve a better cluster effect than using k-means algorithm,and its accuracy could reach above 83%.

Guan Yi