论文信息 - Large-Scale Hierarchical k-means for Heterogeneous Many-Core Supercomputers

Large-Scale Hierarchical k-means for Heterogeneous Many-Core Supercomputers

This paper presents a novel design and implementation of k-means clustering algorithm targeting the Sunway TaihuLight supercomputer. We introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level (nkd) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer. Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation achieves performance of less than 18 seconds per iteration for a large-scale clustering case with 196,608 data dimensions and 2,000 centroids by applying 4,096 nodes (1,064,496 cores) in parallel, making k-means a more feasible solution for complex scenarios.

[1] S. Dasgupta. The hardness of k-means clustering , 2008 .

[2] G.B. Coleman,et al. Image segmentation by clustering , 1979, Proceedings of the IEEE.

[3] Leonardo Torok,et al. k-MS: A novel clustering algorithm based on morphological reconstruction , 2017, Pattern Recognit..

[4] Saeed Shahrivari,et al. High performance parallel $$k$$k-means clustering for disk-resident datasets on multi-core CPUs , 2014, The Journal of Supercomputing.

[5] Yue Zhao,et al. Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup , 2015, ICML.

[6] Joseph JáJá,et al. A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[7] Michael Granitzer,et al. Accelerating K-Means on the Graphics Processor via CUDA , 2009, 2009 First International Conference on Intensive Applications and Services.

[8] Keqiu Li,et al. Optimized big data K-means clustering using MapReduce , 2014, The Journal of Supercomputing.

[9] Inderjit S. Dhillon,et al. A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[10] Lingli Wang,et al. High-performance K-means Implementation based on a Simplified Map-Reduce Architecture , 2016, 1610.05601.

[11] Zohar Yakhini,et al. Clustering gene expression patterns , 1999, J. Comput. Biol..

[12] Wei Ge,et al. The Sunway TaihuLight supercomputer: system and applications , 2016, Science China Information Sciences.

[13] Y. Dora Cai,et al. Grouping game players using parallelized k-means on supercomputers , 2015, XSEDE.

[14] Sudipto Guha,et al. Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[15] Jing Huang,et al. DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16] Guangwen Yang,et al. swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[17] François Fleuret,et al. Nested Mini-Batch K-Means , 2016, NIPS.

[18] Cynthia A. Phillips,et al. k-Means Clustering on Two-Level Memory Systems , 2015, MEMSYS.

[19] Jean-Philippe Martin,et al. Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.

[20] Markus Kächele,et al. Speeding up k-means by approximating Euclidean distances via block vectors , 2016, ICML.

[21] Jiming Liu,et al. Speeding up K-Means Algorithm by GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[22] S. P. Lloyd,et al. Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[23] Congcong Li,et al. Stacked Autoencoder-based deep learning for remote-sensing image classification: a case study of African land-cover mapping , 2016 .

[24] Greg Hamerly,et al. Making k-means Even Faster , 2010, SDM.

[25] Aidong Zhang,et al. Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[26] Miriam Leeser,et al. Accelerating K-Means clustering with parallel implementations and GPU computing , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[27] Ryan R. Curtin. A Dual-Tree Algorithm for Fast k-means Clustering With Large k , 2017, SDM.

[28] François Fleuret,et al. Fast k-means with accurate bounds , 2016, ICML.

[29] Antonio J. Plaza,et al. Cloud implementation of the K-means algorithm for hyperspectral image analysis , 2016, The Journal of Supercomputing.

[30] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .

[31] Anil K. Jain,et al. Algorithms for Clustering Data , 1988 .

[32] Christian Böhm,et al. Multi-core K-means , 2017, SDM.

[33] Weiwei Liu,et al. Compressed K-Means for Large-Scale Clustering , 2017, AAAI.

[34] Jitendra Kumar,et al. Parallel k-Means Clustering for Quantitative Ecoregion Delineation Using Large Data Sets , 2011, ICCS.