Design and Implementation of an Efficient Parallel Feel-the-Way Clustering Algorithm on High Performance Computing Systems

This paper proposes a Feel-the-Way clustering method, which reduces the synchronization and communication overhead, meanwhile providing as good as or better convergence rate than the synchronous clustering methods. The Feel-the-Way clustering algorithm explores the problem space by using the philosophy of “crossing an unknown river by feeling the pebbles in the riverbed.” A full-step Feel-the-Way clustering algorithm is first designed to reduce the number of iterations. In the full-step Feel-the-Way algorithm, each process runs a number of L steps before synchronizing its local solutions with other processes. This full-step algorithm can significantly decrease the number of iterations compared to the k-means clustering method. Next, we extend the full-step algorithm to a sampling-based Feel-the-Way algorithm to achieve higher performance. Furthermore, we prove that the proposed new algorithms (both full-step and sampling-based Feel-the-Way) can always converge. Our empirical results demonstrate that the optimized sampling-based Feel-the-Way method is much faster than the widely used k-means clustering method as well as providing comparable costs. A number of experiments with synthetic datasets, real-world datasets of MNIST, CIFAR-10, ENRON, and PLACES-2 show that the new parallel algorithm can outperform the k-means by up to 235% on a high performance computing system with 2,048 CPU cores.

[1]  James Demmel,et al.  Scaling Deep Learning on GPU and Knights Landing clusters , 2017, SC17: International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  James Demmel,et al.  Matrix factorizations at scale: A comparison of scientific data analytics in spark and C+MPI using three case studies , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[3]  Jack J. Dongarra,et al.  Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  L. Elsner,et al.  Models of parallel chaotic iteration methods , 1988 .

[5]  Lan Lin,et al.  Designing a Synchronization-reducing Clustering Method on Manycores: Some Issues and Improvements , 2017, MLHPC@SC.

[6]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Qiang Wang,et al.  Benchmarking State-of-the-Art Deep Learning Software Tools , 2016, 2016 7th International Conference on Cloud Computing and Big Data (CCBD).

[9]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[10]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[11]  Giancarlo Fortino,et al.  Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks , 2013, J. Parallel Distributed Comput..

[12]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[13]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[14]  Mehmet M. Dalkilic,et al.  A novel approach to optimization of iterative machine learning algorithms: Over heap structure , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[15]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[16]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[17]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[18]  Jack J. Dongarra,et al.  Exascale computing and big data , 2015, Commun. ACM.

[19]  D. Szyld,et al.  On asynchronous iterations , 2000 .