Large-Scale Hierarchical k-means for Heterogeneous Many-Core Supercomputers

This paper presents a novel design and implementation of k-means clustering algorithm targeting the Sunway TaihuLight supercomputer. We introduce a multi-level parallel partition approach that not only partitions by dataflow and centroid, but also by dimension. Our multi-level (nkd) approach unlocks the potential of the hierarchical parallelism in the SW26010 heterogeneous many-core processor and the system architecture of the supercomputer. Our design is able to process large-scale clustering problems with up to 196,608 dimensions and over 160,000 targeting centroids, while maintaining high performance and high scalability, significantly improving the capability of k-means over previous approaches. The evaluation shows our implementation achieves performance of less than 18 seconds per iteration for a large-scale clustering case with 196,608 data dimensions and 2,000 centroids by applying 4,096 nodes (1,064,496 cores) in parallel, making k-means a more feasible solution for complex scenarios.

[1]  S. Dasgupta The hardness of k-means clustering , 2008 .

[2]  G.B. Coleman,et al.  Image segmentation by clustering , 1979, Proceedings of the IEEE.

[3]  Leonardo Torok,et al.  k-MS: A novel clustering algorithm based on morphological reconstruction , 2017, Pattern Recognit..

[4]  Saeed Shahrivari,et al.  High performance parallel $$k$$k-means clustering for disk-resident datasets on multi-core CPUs , 2014, The Journal of Supercomputing.

[5]  Yue Zhao,et al.  Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup , 2015, ICML.

[6]  Joseph JáJá,et al.  A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[7]  Michael Granitzer,et al.  Accelerating K-Means on the Graphics Processor via CUDA , 2009, 2009 First International Conference on Intensive Applications and Services.

[8]  Keqiu Li,et al.  Optimized big data K-means clustering using MapReduce , 2014, The Journal of Supercomputing.

[9]  Inderjit S. Dhillon,et al.  A Data-Clustering Algorithm on Distributed Memory Multiprocessors , 1999, Large-Scale Parallel Data Mining.

[10]  Lingli Wang,et al.  High-performance K-means Implementation based on a Simplified Map-Reduce Architecture , 2016, 1610.05601.

[11]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[12]  Wei Ge,et al.  The Sunway TaihuLight supercomputer: system and applications , 2016, Science China Information Sciences.

[13]  Y. Dora Cai,et al.  Grouping game players using parallelized k-means on supercomputers , 2015, XSEDE.

[14]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[15]  Jing Huang,et al.  DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Guangwen Yang,et al.  swDNN: A Library for Accelerating Deep Learning Applications on Sunway TaihuLight , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[17]  François Fleuret,et al.  Nested Mini-Batch K-Means , 2016, NIPS.

[18]  Cynthia A. Phillips,et al.  k-Means Clustering on Two-Level Memory Systems , 2015, MEMSYS.

[19]  Jean-Philippe Martin,et al.  Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.

[20]  Markus Kächele,et al.  Speeding up k-means by approximating Euclidean distances via block vectors , 2016, ICML.

[21]  Jiming Liu,et al.  Speeding up K-Means Algorithm by GPUs , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[22]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[23]  Congcong Li,et al.  Stacked Autoencoder-based deep learning for remote-sensing image classification: a case study of African land-cover mapping , 2016 .

[24]  Greg Hamerly,et al.  Making k-means Even Faster , 2010, SDM.

[25]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[26]  Miriam Leeser,et al.  Accelerating K-Means clustering with parallel implementations and GPU computing , 2015, 2015 IEEE High Performance Extreme Computing Conference (HPEC).

[27]  Ryan R. Curtin A Dual-Tree Algorithm for Fast k-means Clustering With Large k , 2017, SDM.

[28]  François Fleuret,et al.  Fast k-means with accurate bounds , 2016, ICML.

[29]  Antonio J. Plaza,et al.  Cloud implementation of the K-means algorithm for hyperspectral image analysis , 2016, The Journal of Supercomputing.

[30]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[31]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[32]  Christian Böhm,et al.  Multi-core K-means , 2017, SDM.

[33]  Weiwei Liu,et al.  Compressed K-Means for Large-Scale Clustering , 2017, AAAI.

[34]  Jitendra Kumar,et al.  Parallel k-Means Clustering for Quantitative Ecoregion Delineation Using Large Data Sets , 2011, ICCS.