A mapreduce fuzzy techniques of big data classification

Due to the huge increase in the size of the data it becomes troublesome to perform efficient analysis using the current traditional techniques. Big data put forward a lot of challenges due to its several characteristics like volume, velocity, variety, variability, value and complexity. Today there is not only a necessity for efficient data mining techniques to process large volume of data but in addition a need for a means to meet the computational requirements to process such huge volume of data. The objective of this research is to implement a map reduce paradigm using fuzzy and crisp techniques, and to provide a comparative study between the results of the proposed systems and the methods reviewed in the literature. In this paper four proposed system is implemented using the map reduce paradigm to process on big data. First, in the mapper there are two techniques used; the fuzzy k-nearest neighbor method as a fuzzy technique and the support vector machine as non-fuzzy technique. Second, in the reducer there are three techniques used; the mode, the fuzzy soft labels and Gaussian fuzzy membership function. The first proposed system is using the fuzzy KNN in the mapper and the mode in the reducer, the second proposed system is using the SVM in the mapper and the mode in the reducer, the third proposed system is using the SVM in the mapper and the soft labels in the reducer, and the fourth proposed system is using the SVM in the mapper and fuzzy Gaussian membership function in the reducer. Results on different data sets show that the fuzzy proposed methods outperform a better performance than the crisp proposed method and the method reviewed in the literature.

[1]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[2]  Xindong Wu,et al.  MReC4.5: C4.5 Ensemble Classification with MapReduce , 2009, 2009 Fourth ChinaGrid Annual Conference.

[3]  Maciej Kopczynski,et al.  Computation of Cores in Big Datasets: An FPGA Approach , 2015, RSKT.

[4]  Xuehai Zhou,et al.  Unbinds data and tasks to improving the Hadoop performance , 2014, 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[5]  Maozhen Li,et al.  HSim: A MapReduce simulator in enabling Cloud Computing , 2013, Future Gener. Comput. Syst..

[6]  Debajyoti Mukhopadhyay,et al.  A Survey of Classification Techniques in the Area of Big Data , 2015, ArXiv.

[7]  Ke Xu,et al.  A MapReduce based Parallel SVM for Email Classification , 2014, J. Networks.

[8]  Guangjie Han,et al.  A survey of recent technologies and challenges in big data utilizations , 2015, 2015 International Conference on Information and Communication Technology Convergence (ICTC).

[9]  Francisco Herrera,et al.  A MapReduce Approach to Address Big Data Classification Problems Based on the Fusion of Linguistic Fuzzy Rules , 2015, Int. J. Comput. Intell. Syst..

[10]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[11]  Günther Palm,et al.  A Study of the Robustness of KNN Classifiers Trained Using Soft Labels , 2006, ANNPR.

[12]  George J. Klir,et al.  Fuzzy sets and fuzzy logic , 1995 .

[13]  Francisco Herrera,et al.  MRPR: A MapReduce solution for prototype reduction in big data classification , 2015, Neurocomputing.

[14]  P. Pardalos,et al.  A Survey of Support Vector Machines with Uncertainties , 2014, Annals of Data Science.

[15]  Hanghang Tong,et al.  Big Data Classification , 2014, Data Classification: Algorithms and Applications.

[16]  Beng Chin Ooi,et al.  Efficient Processing of k Nearest Neighbor Joins using MapReduce , 2012, Proc. VLDB Endow..

[17]  Hongyan Li,et al.  MapReduce-based Backpropagation Neural Network over large scale mobile data , 2010, 2010 Sixth International Conference on Natural Computation.

[18]  Siegfried Gottwald,et al.  Fuzzy Sets and Fuzzy Logic , 1993 .