Performance analysis of Hoeffding trees in data streams by using massive online analysis framework

Present work is mainly concerned with the understanding of the problem of classification from the data stream perspective on evolving streams using massive online analysis framework with regard to different Hoeffding trees. Advancement of the technology both in the area of hardware and software has led to the rapid storage of data in huge volumes. Such data is referred to as a data stream. Traditional data mining methods are not capable of handling data streams because of the ubiquitous nature of data streams. The challenging task is how to store, analyse and visualise such large volumes of data. Massive data mining is a solution for these challenges. In the present analysis five different Hoeffding trees are used on the available eight dataset generators of massive online analysis framework and the results predict that stagger generator happens to be the best performer for different classifiers.

[1]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[2]  Richard Brendon Kirkby,et al.  Improving Hoeffding Trees , 2007 .

[3]  Suzana Loskovska,et al.  A SURVEY OF STREAM DATA MINING , 2007 .

[4]  Richard Granger,et al.  Incremental Learning from Noisy Data , 1986, Machine Learning.

[5]  Sebastián Ventura,et al.  Educational Data Mining: A Review of the State of the Art , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[7]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[8]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[9]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[10]  Albert Bifet,et al.  DATA STREAM MINING A Practical Approach , 2009 .

[11]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[12]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[13]  Amir Globerson,et al.  Metric Learning by Collapsing Classes , 2005, NIPS.

[14]  Yoram Singer,et al.  Online Learning Meets Optimization in the Dual , 2006, COLT.

[15]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[16]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[17]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[18]  P. K. Srimani,et al.  Edu‐mining: A Machine Learning Approach , 2011 .

[19]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[20]  Malini M. Patil A Classification Model for Edu-Mining Prof , .

[21]  Geoff Holmes,et al.  Ensembles of Restricted Hoeffding Trees , 2012, TIST.

[22]  Jitendra Malik,et al.  Image Retrieval and Classification Using Local Distance Functions , 2006, NIPS.

[23]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[24]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[25]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[26]  Geoff Holmes,et al.  Accurate Ensembles for Data Streams: Combining Restricted Hoeffding Trees using Stacking , 2010, ACML.

[27]  Geoff Holmes,et al.  New Options for Hoeffding Trees , 2007, Australian Conference on Artificial Intelligence.

[28]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[29]  Inderjit S. Dhillon,et al.  Online Metric Learning and Fast Similarity Search , 2008, NIPS.

[30]  Yoshitaka Sakurai,et al.  Knowledge mining for supporting learning processes , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[31]  Ryan S. Baker,et al.  The State of Educational Data Mining in 2009: A Review and Future Visions. , 2009, EDM 2009.

[32]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[33]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[34]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .