An optimized approach for unbalanced big data categorizing using fuzzy clustering

Big data is a set of very large and complex data that is hard to load on computers. The main challenge in big data world is related to their search, categorize and analyze specially, when they are unbalanced. Despite, there are a lot of works in the field of big data but analyzing unbalanced big data is still a fundamental challenge in this area. In this paper we try to solve the problem of RSIO-LFCM method in face with unbalanced data and in training phase, we increase its accuracy in order to identify classes with low frequency of samples. Our proposed method starts with adding a little change in the initial phase of the algorithm. Then we add a phase in order to balance samples frequency to resolve RSIO-LFCM problems. The results show that in compare with RSIO-LFCM method, our proposed method has better accuracy in identifying super clusters and its corresponding super classes and also in identifying small clusters and classes.

[1]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[2]  Marimuthu Palaniswami,et al.  Fuzzy c-Means Algorithms for Very Large Data , 2012, IEEE Transactions on Fuzzy Systems.

[3]  Daoqiang Zhang,et al.  A Multiobjective Simultaneous Learning Framework for Clustering and Classification , 2010, IEEE Transactions on Neural Networks.

[4]  Aruna Tiwari,et al.  Handling Big Data with Fuzzy Based Classification Approach , 2013, WCSC.