Parallelizing DBSCaN Algorithm Using MPI

The last years, huge bundles of information are extracted by computational systems and electronic devices. To exploit the derived amount of data, new innovative algorithms must be employed or the established ones maybe changed. One of the most fascinating and productive techniques, in order to locate and extract information from data repositories is clustering, and DBSCAN is a successful density based algorithm which clusters data according its characteristics. However, its main disadvantage is its severe computational complexity which proves the technique very inadequate to apply on big datasets. Although DBSCAN is a very well studied technique, a fully operational parallel version of it, has not been accepted yet by the scientific community. In this work, a three phase parallel version of DBSCAN is presented. The obtained experimental results are very promising and prove the correctness, the scalability, and the effectiveness of the technique.

[1]  Di Ma,et al.  MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[2]  José Marinho,et al.  Message Passing Interface for Win 32 Clusters , 2000 .

[3]  Ashwin Pajankar Message Passing Interface , 2017 .

[4]  Giuseppe Pappalardo,et al.  Providing QoS strategies and cloud‐integration to web servers by means of aspects , 2015, Concurr. Comput. Pract. Exp..

[5]  Giuseppe M. L. Sarnè,et al.  A trust-aware, self-organizing system for large-scale federations of utility computing infrastructures , 2016, Future Gener. Comput. Syst..

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Giuseppe M. L. Sarnè,et al.  An agent‐oriented, trust‐aware approach to improve the QoS in dynamic grid federations , 2015, Concurr. Comput. Pract. Exp..

[8]  Hans-Peter Kriegel,et al.  Scalable Density-Based Distributed Clustering , 2004, PKDD.

[9]  Bi-Ru Dai,et al.  Efficient Map/Reduce-Based DBSCAN Algorithm with Optimized Data Partition , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[10]  Haoyu Tan,et al.  MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data , 2013, Frontiers of Computer Science.

[11]  Giuseppe M. L. Sarnè,et al.  A QoS-Aware, Trust-Based Aggregation Model for Grid Federations , 2014, OTM Conferences.

[12]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[13]  Jón Atli Benediktsson,et al.  On Scalable Data Mining Techniques for Earth Science , 2015, ICCS.

[14]  Wei-keng Liao,et al.  A new scalable parallel DBSCAN algorithm using the disjoint-set data structure , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.