Evolving Classifier TEDAClass for Big Data

Abstract In the era of big data, huge amounts of data are generated and updated every day, and their processing and analysis is an important challenge today. In order to tackle this challenge, it is necessary to develop specific techniques which can process large volume of data within limited run times. TEDA is a new systematic framework for data analytics, which is based on the typicality and eccentricity of the data. This framework is spatially-aware, non-frequentist and non-parametric. TEDA can be used for development of alternative machine learning methods, in this work, we will use it for classification (TEDAClass). Specifically, we present a TEDAClass based approach which can process huge amounts of data items using a novel parallelization technique. Using this parallelization, we make possible the scalability of TEDAClass. In that way, the proposed approach is particularly useful for various applications, as it opens the doors for high-performance big data processing, which could be particularly useful for healthcare, banking, scientific and many other purposes.

[1]  Hanna M. Wallach,et al.  Computational social science and social computing , 2013, Machine Learning.

[2]  Antonio Torralba,et al.  Building the gist of a scene: the role of global image features in recognition. , 2006, Progress in brain research.

[3]  Plamen P. Angelov,et al.  Evolving Fuzzy-Rule-Based Classifiers From Data Streams , 2008, IEEE Transactions on Fuzzy Systems.

[4]  James C. Bezdek,et al.  Extending fuzzy and probabilistic clustering to very large data sets , 2006, Comput. Stat. Data Anal..

[5]  Plamen P. Angelov,et al.  A new type of simplified fuzzy rule-based system , 2012, Int. J. Gen. Syst..

[6]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[7]  D. Metcalfe,et al.  Climate science: A sink down under , 2014, Nature.

[8]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[9]  A. Haar Zur Theorie der orthogonalen Funktionensysteme , 1910 .

[10]  Dilpreet Singh,et al.  A survey on platforms for big data analytics , 2014, Journal of Big Data.

[11]  Plamen Angelov,et al.  Anomaly detection based on eccentricity analysis , 2014, 2014 IEEE Symposium on Evolving and Autonomous Learning Systems (EALS).

[12]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[13]  Plamen P. Angelov,et al.  Symbol recognition with a new autonomously evolving classifier autoclass , 2014, 2014 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS).

[14]  Plamen Angelov Sense and Avoid in UAS : Research and Applications , 2012 .

[15]  Plamen Angelov,et al.  Outside the box: an alternative data analytics framework , 2014, J. Autom. Mob. Robotics Intell. Syst..

[16]  Anjan K. Koundinya,et al.  MapReduce Design of K-Means Clustering Algorithm , 2013, 2013 International Conference on Information Science and Applications (ICISA).

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Plamen Angelov,et al.  Evolving clustering, classification and regression with TEDA , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[19]  Neil Savage,et al.  Bioinformatics: Big data versus the big C , 2014, Nature.