Performance analysis of parallel CBAR in MapReduce environment

Clustering of data set is a very contemporary problem for handling big data and parallelizing the process of clustering helps in improving efficiency for applications which involve frequent searching. Various clustering techniques are used for grouping of data set and CBAR is one such very frequently used technique used for different applications. Parallelization of CBAR is very necessary for handling Bigdata and Hadoop MapReduce platform provides a suitable environment to improve efficiency for any problem dealing with huge volume of data by using appropriate segmentation technique. In this work, we designed and developed a few algorithms for implementing CBAR using MapReduce technique and tested the results in different clusters of up to 4 nodes. Significant improvement has been observed and analysis and explanation on these results have also been presented in our work with suitable example.

[1]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.

[2]  Wei Wang,et al.  Design and Implementation of Modular FPGA-Based PID Controllers , 2007, IEEE Transactions on Industrial Electronics.

[3]  Zeliang Shu,et al.  Steady-State and Dynamic Study of Active Power Filter With Efficient FPGA-Based Control Algorithm , 2008, IEEE Transactions on Industrial Electronics.

[4]  H. Akagi,et al.  A new approach to harmonic compensation in power systems , 1988, Conference Record of the 1988 IEEE Industry Applications Society Annual Meeting.

[5]  Francisco A. S. Neves,et al.  Implementation of a Digital Signal Processor-controlled Shunt Active Filter , 2006 .

[6]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Claire Cardie,et al.  Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577–584. Constrained K-means Clustering with Background Knowledge , 2022 .

[8]  Yuh-Jiuan Tsay,et al.  CBAR: an efficient method for mining association rules , 2005, Knowl. Based Syst..

[9]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[10]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[11]  M. Etezadi-Amoli,et al.  Voltage and current harmonic content of a utility system-a summary of 1120 test measurements , 1990 .

[12]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[13]  Ajay Rana,et al.  Online Mining of data to generate association rule mining in large databases , 2011, 2011 International Conference on Recent Trends in Information Systems.

[14]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[15]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[16]  Vive Kumar,et al.  Continuous Clustering in Big Data Learning Analytics , 2013, 2013 IEEE Fifth International Conference on Technology for Education (t4e 2013).

[17]  Lawrence R. Rabiner,et al.  A modified K-means clustering algorithm for use in isolated work recognition , 1985, IEEE Trans. Acoust. Speech Signal Process..

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Ranjan Dasgupta,et al.  An Improved Job Scheduling Algorithm by Utilizing Released Resources for MapReduce , 2014, 2014 Fourth International Conference of Emerging Applications of Information Technology.

[20]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[21]  Xindong Wu,et al.  A 2-Tier Clustering Algorithm with Map-Reduce , 2010, 2010 Fifth Annual ChinaGrid Conference.

[22]  P. J. Rose,et al.  Harmonics: the effects on power quality and transformers , 1994 .

[23]  R. Yacamini Power system harmonics. II. Measurements and calculations , 1995 .