Review: Approaches for Handling DataStream

Today, there is tremendous use of technology that causes generation of huge volume of data called as Data Stream. Data Stream are continuous, unbounded and usually come with high speed and changes with time. It has different issues such as Memory, Time, Noise, Dynamic data. There is need of handling data streams because of its changing nature, and the data stream may be labelled or it may be unlabelled. Classification is supervised it can only handle labelled data. Thus, there is need of Hybrid Ensemble Classifier in which clustering and classifier are brought together so that the labelled as well as unlabelled datastream both can be handled. This Paper describes different Approaches for Handling DataStream.

[1]  Lawrence O. Hall,et al.  A New Ensemble Diversity Measure Applied to Thinning Ensembles , 2003, Multiple Classifier Systems.

[2]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[5]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[6]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[7]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[8]  Ying Wah Teh,et al.  A study of density-grid based clustering algorithms on data streams , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[9]  Daniel Hernández-Lobato,et al.  An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  David Masip,et al.  Geometry-Based Ensembles: Toward a Structural Characterization of the Classification Boundary , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[12]  Terry Windeatt,et al.  Accuracy/Diversity and Ensemble MLP Classifier Design , 2006, IEEE Transactions on Neural Networks.

[13]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[14]  P. C. Satpute,et al.  Intellectual Climate System for Monitoring Industrial Environment , 2013, 2013 Third International Conference on Advanced Computing and Communication Technologies (ACCT).

[15]  Guo Qiang An Effective Algorithm for Improving the Performance of Naive Bayes for Text Classification , 2010, 2010 Second International Conference on Computer Research and Development.

[16]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[17]  Hai Huang,et al.  rDenStream, A Clustering Algorithm over an Evolving Data Stream , 2009, 2009 International Conference on Information Engineering and Computer Science.

[18]  Stuart J. Russell,et al.  Experimental comparisons of online and batch versions of bagging and boosting , 2001, KDD '01.

[19]  Edwin Lughofer Dynamic Evolving Cluster Models Using On-line Split-and-Merge Operations , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[20]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[21]  Fabio Roli,et al.  A Theoretical Analysis of Bagging as a Linear Combination of Classifiers , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[23]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[24]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[25]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[26]  João Gama,et al.  Hierarchical Clustering of Time-Series Data Streams , 2008, IEEE Transactions on Knowledge and Data Engineering.

[27]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[28]  Douglas H. Fisher,et al.  Iterative Optimization and Simplification of Hierarchical Clusterings , 1996, J. Artif. Intell. Res..

[29]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[30]  Hannu Oja,et al.  Classification Based on Hybridization of Parametric and Nonparametric Classifiers , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  P. C. Satpute,et al.  A Study of Data Mining Techniques for WSN Based Intellectual Climate System , 2012 .

[32]  Morteza Haghir Chehreghani,et al.  Improving density-based methods for hierarchical clustering of web pages , 2008, Data Knowl. Eng..

[33]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[34]  Ashfaqur Rahman,et al.  Cluster-Oriented Ensemble Classifier: Impact of Multicluster Characterization on Ensemble Classifier Learning , 2012, IEEE Transactions on Knowledge and Data Engineering.

[35]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[36]  Kitsana Waiyamai,et al.  E-Stream: Evolution-Based Technique for Stream Clustering , 2007, ADMA.

[37]  Aryya Gangopadhyay,et al.  A method for clustering transient data streams , 2009, SAC '09.

[38]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[39]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[40]  Bhavani M. Thuraisingham,et al.  A Multi-partition Multi-chunk Ensemble Technique to Classify Concept-Drifting Data Streams , 2009, PAKDD.

[41]  GuoQiang An Effective Algorithm for Improving the Performance of Naive Bayes for Text Classification , 2010 .