Hybrid Ensemble Classifier for Stream Data

Data streams are continuous, unbounded, usually come with high speed and have a data distribution that often changes with time. It has different issues such as memory, time, Data Processing Model. There is need of handling data streams because of its changing nature, and the data stream may be labeled or it may be unlabelled. Classification is supervised it can only handle labeled data Thus, In this Paper a Hybrid Ensemble Classifier is proposed in which clustering and classifier are brought together. In this proposed method classification and clustering are combined. The clustering is used at this point because clustering can handle unlabelled data streams also. In this method Data stream is given as input then, with the help of windowing technique the large data stream is divided into small parts. This Paper describes new Hybrid Ensemble Classifier that will definitely improve the performance in terms of accuracy.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[3]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[4]  Hai Huang,et al.  rDenStream, A Clustering Algorithm over an Evolving Data Stream , 2009, 2009 International Conference on Information Engineering and Computer Science.

[5]  Philip S. Yu,et al.  Mining Concept-Drifting Data Streams , 2010, Data Mining and Knowledge Discovery Handbook.

[6]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[7]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[8]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[9]  Daniel Hernández-Lobato,et al.  An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Kapil Keshao Wankhade,et al.  A fast and light classifier for data streams , 2010, Evol. Syst..

[11]  Terry Windeatt,et al.  Accuracy/Diversity and Ensemble MLP Classifier Design , 2006, IEEE Transactions on Neural Networks.

[12]  Guo Qiang An Effective Algorithm for Improving the Performance of Naive Bayes for Text Classification , 2010, 2010 Second International Conference on Computer Research and Development.

[13]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[14]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[15]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[16]  Ying Wah Teh,et al.  A study of density-grid based clustering algorithms on data streams , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[17]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[18]  Ashfaqur Rahman,et al.  Cluster-Oriented Ensemble Classifier: Impact of Multicluster Characterization on Ensemble Classifier Learning , 2012, IEEE Transactions on Knowledge and Data Engineering.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Bhavani M. Thuraisingham,et al.  A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams , 2009, PAKDD.

[21]  GuoQiang An Effective Algorithm for Improving the Performance of Naive Bayes for Text Classification , 2010 .

[22]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[23]  Fabio Roli,et al.  A Theoretical Analysis of Bagging as a Linear Combination of Classifiers , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Stuart J. Russell,et al.  Experimental comparisons of online and batch versions of bagging and boosting , 2001, KDD '01.

[25]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[26]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[27]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[28]  Lawrence O. Hall,et al.  A New Ensemble Diversity Measure Applied to Thinning Ensembles , 2003, Multiple Classifier Systems.

[29]  Morteza Haghir Chehreghani,et al.  Improving density-based methods for hierarchical clustering of web pages , 2008, Data Knowl. Eng..