An empirical model (EM: CCO) for clustering, convergence and center optimization in distributive databases

Conventional clustering methods have an assumption that data is stored centrally and are memory resident which made it tough to arrive at solutions when dealing with large data. Centralizing huge data from multiple locations are always a challenging task owing to the large memory space and computational time required by traditional mining methods. Traditional k-means type of clustering were used for the identification of clusters’ prototype that can serve as a representative point in a large dataset and the major setback is that the cluster centers tend to distort the distribution of the underlying data making the representative points incapable of handling the complete distribution of the data leading to poor pattern generation. With the aim to resolve this issue, this paper proposes an empirical model (EM) that ensures the centers of the cluster for capturing the data distribution which lies under. In the proposed methodology, the asymptotic convergence is centered on the data which is distributed. Secondly, an efficient mechanism for measuring the cluster centers in practice. Finally, a methodology for distributive convergence and center optimization is proposed. The model is compared with that of other methods in the literature and the results are discussed.

[1]  Samir Brahim Belhaouari,et al.  Optimized K-Means Algorithm , 2014 .

[2]  Hui Xu,et al.  Research on Dynamic Generating Algorithms of Large Itemsets of Distributive Data Mining Architecture , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[3]  Hermann Kopetz Which models and architectures of distributed real-time computing systems suit which application area? , 1999, Proceedings 2nd IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC'99) (Cat. No.99-61702).

[4]  C. L. Philip Chen,et al.  Uncertain Data Clustering in Distributed Peer-to-Peer Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Darek Bober,et al.  Distributed system for data acquisition and management of electric energy consumption , 2009, 2009 IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications.

[6]  Stephen Becker,et al.  Preconditioned Data Sparsification for Big Data With Applications to PCA and K-Means , 2015, IEEE Transactions on Information Theory.

[7]  Jun Feng,et al.  A survey of mapreduce based parallel processing technologies , 2014 .

[8]  Hassan Echoukairi,et al.  A novel centralized clustering approach based on K-means algorithm for wireless sensor network , 2017, 2017 Computing Conference.

[9]  Denis Hamad,et al.  Kernel PCA as a Visualization Tools for Clusters Identifications , 2006, ICANN.

[10]  Deepak Kumar Sharma,et al.  kROp: k-Means clustering based routing protocol for opportunistic networks , 2019, J. Ambient Intell. Humaniz. Comput..

[11]  Thomas J. Hacker,et al.  Cluster analysis for the cloud: Parallel competitive fitness and parallel K-means++ for large dataset analysis , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[12]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[13]  Mohammed J. Zaki,et al.  A distributed approach for graph mining in massive networks , 2016, Data Mining and Knowledge Discovery.

[14]  Justin Zhijun Zhan,et al.  Data mining in distributed environment: a survey , 2017, WIREs Data Mining Knowl. Discov..

[15]  Priyanka Tripathi,et al.  An Analytic Survey on MapReduce based K-Means and its Hybrid Clustering Algorithms , 2018, 2018 Second International Conference on Computing Methodologies and Communication (ICCMC).

[16]  Sibaram Khara,et al.  Balanced Cluster Head Selection Based on Modified k-Means in a Distributed Wireless Sensor Network , 2016, Int. J. Distributed Sens. Networks.

[17]  Na Liu,et al.  A differential privacy protecting K-means clustering algorithm based on contour coefficients , 2018, PloS one.

[18]  Hoda Mashayekhi,et al.  GDCluster: A General Decentralized Clustering Algorithm , 2015, IEEE Transactions on Knowledge and Data Engineering.

[19]  Suhas Gajre,et al.  Cluster-based real-time analysis of mobile healthcare application for prediction of physiological data , 2018, J. Ambient Intell. Humaniz. Comput..

[20]  Nicholette D. Palmer,et al.  Novel genetic associations for blood pressure identified via gene-alcohol interaction in up to 570K individuals across multiple ancestries , 2018, PloS one.

[21]  Hui Wang,et al.  Research on Distributive Algorithm of Data Mining with Association Rules , 2009, 2009 International Conference on Management and Service Science.

[22]  Vineet Richhariya,et al.  Global high dimension outlier algorithm for efficient clustering & outlier detection , 2016, 2016 Symposium on Colossal Data Analysis and Networking (CDAN).

[23]  I. Elishakoff,et al.  Antioptimization of earthquake exitation and response , 1998 .

[24]  Iven Van Mechelen,et al.  On the Added Value of Bootstrap Analysis for K-Means Clustering , 2015, Journal of Classification.

[25]  Lianying Zhou,et al.  A Classifier Build Around Cellular Automata for Distributed Data Mining , 2008, 2008 International Conference on Computer Science and Software Engineering.

[26]  Aoying Zhou,et al.  Distributed Data Stream Clustering: A Fast EM-based Approach , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[27]  K. Thangavel,et al.  Distributed Data Clustering: A Comparative Analysis , 2009, Foundations of Computational Intelligence.

[28]  Keqiu Li,et al.  Efficient $k$ -Means++ Approximation with MapReduce , 2014, IEEE Trans. Parallel Distributed Syst..

[29]  María S. Pérez-Hernández,et al.  Improving Distributed Data Mining Techniques by Means of a Grid Infrastructure , 2004, OTM Workshops.