论文信息 - Design and Implementation of an Efficient Parallel Feel-the-Way Clustering Algorithm on High Performance Computing Systems

Design and Implementation of an Efficient Parallel Feel-the-Way Clustering Algorithm on High Performance Computing Systems

This paper proposes a Feel-the-Way clustering method, which reduces the synchronization and communication overhead, meanwhile providing as good as or better convergence rate than the synchronous clustering methods. The Feel-the-Way clustering algorithm explores the problem space by using the philosophy of “crossing an unknown river by feeling the pebbles in the riverbed.” A full-step Feel-the-Way clustering algorithm is first designed to reduce the number of iterations. In the full-step Feel-the-Way algorithm, each process runs a number of L steps before synchronizing its local solutions with other processes. This full-step algorithm can significantly decrease the number of iterations compared to the k-means clustering method. Next, we extend the full-step algorithm to a sampling-based Feel-the-Way algorithm to achieve higher performance. Furthermore, we prove that the proposed new algorithms (both full-step and sampling-based Feel-the-Way) can always converge. Our empirical results demonstrate that the optimized sampling-based Feel-the-Way method is much faster than the widely used k-means clustering method as well as providing comparable costs. A number of experiments with synthetic datasets, real-world datasets of MNIST, CIFAR-10, ENRON, and PLACES-2 show that the new parallel algorithm can outperform the k-means by up to 235% on a high performance computing system with 2,048 CPU cores.

Dali Wang | Fengguang Song | Weijian Zheng

[1] James Demmel,et al. Scaling Deep Learning on GPU and Knights Landing clusters , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[2] James Demmel,et al. Matrix factorizations at scale: A comparison of scientific data analytics in spark and C+MPI using three case studies , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[3] Jack J. Dongarra,et al. Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[4] L. Elsner,et al. Models of parallel chaotic iteration methods , 1988 .

[5] Lan Lin,et al. Designing a Synchronization-reducing Clustering Method on Manycores: Some Issues and Improvements , 2017, MLHPC@SC.

[6] James Demmel,et al. Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[7] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8] Qiang Wang,et al. Benchmarking State-of-the-Art Deep Learning Software Tools , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).

[9] Yiming Yang,et al. The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[10] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[11] Giancarlo Fortino,et al. Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks , 2013, J. Parallel Distributed Comput..

[12] D. Sculley,et al. Web-scale k-means clustering , 2010, WWW '10.

[13] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[14] Mehmet M. Dalkilic,et al. A novel approach to optimization of iterative machine learning algorithms: Over heap structure , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[15] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[16] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[17] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[18] Jack J. Dongarra,et al. Exascale computing and big data , 2015, Commun. ACM.

[19] D. Szyld,et al. On asynchronous iterations , 2000 .