THRFuzzy: Tangential holoentropy-enabled rough fuzzy classifier to classification of evolving data streams

The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means (FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers.

[1]  Jiawei Han,et al.  On Appropriate Assumptions to Mine Data Streams: Analysis and Practice , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[2]  Derong Liu,et al.  Detecting and Reacting to Changes in Sensing Units: The Active Classifier Case , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[3]  Ernestina Menasalvas Ruiz,et al.  Tracking recurrent concepts using context , 2010, Intell. Data Anal..

[4]  David B. Skillicorn,et al.  Classification Using Streaming Random Forests , 2011, IEEE Transactions on Knowledge and Data Engineering.

[5]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition : methods that search for structures in data , 1992 .

[6]  Xiaodong Lin,et al.  Active Learning From Stream Data Using Optimal Weight Classifier Ensemble , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[7]  Marcus A. Maloof,et al.  Using additive expert ensembles to cope with concept drift , 2005, ICML.

[8]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[9]  Jerzy Stefanowski,et al.  Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[11]  Xindong Wu,et al.  Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams , 2006, Data Mining and Knowledge Discovery.

[12]  Grigorios Tsoumakas,et al.  On the Utility of Incremental Feature Selection for the Classification of Textual Data Streams , 2005, Panhellenic Conference on Informatics.

[13]  Li Guo,et al.  E-Tree: An Efficient Indexing Structure for Ensemble Models on Data Streams , 2015, IEEE Transactions on Knowledge and Data Engineering.

[14]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the Gaussian Approximation , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[16]  Charu C. Aggarwal,et al.  Classification and Adaptive Novel Class Detection of Feature-Evolving Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[17]  Dimitris K. Tasoulis,et al.  Nonparametric Monitoring of Data Streams for Changes in Location and Scale , 2011, Technometrics.

[18]  Chelpa Lingam,et al.  Generalized Spatial Kernel based Fuzzy C-Means Clustering Algorithm for Image Segmentation , 2013 .

[19]  João Gama,et al.  Tracking Recurring Concepts with Meta-learners , 2009, EPIA.

[20]  Jesús S. Aguilar-Ruiz,et al.  A similarity-based approach for data stream classification , 2014, Expert Syst. Appl..

[21]  Ernestina Menasalvas Ruiz,et al.  Mining Recurring Concepts in a Dynamic Feature Space , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[23]  Wei Fan,et al.  Systematic data selection to mine concept-drifting data streams , 2004, KDD.