A Data Streams Analysis Strategy Based on Hoeffding Tree with Concept Drift on Hadoop System

The massive sensor data streams analysis in the monitoring application of internet of things is very important, especially in the environments where supporting such kind of real time streaming data storage and management. In order to support the classification of the massive sensor data streams, in this paper, a massive sensor data streams analysis strategy is proposed based on Hoeffding tree with concept drift for event monitoring application on Hadoop system. The proposed strategy is sufficient for sensor data streams classification tasks using map-reduce platform of Hadoop system. Finally, the possibilities of the strategy are demonstrated on spatial sensing data streams processing operations in comparison with existing solutions in the cloud computing environment. The simulation results show that the strategy achieves more energy savings and also ensures few amounts of sensor data retained in memory.

[1]  Qi Kai,et al.  Real-Time Processing for High Speed Data Stream over Large Scale Data , 2012 .

[2]  Nan Li,et al.  Ensemble classification algorithm for high speed data stream: Ensemble classification algorithm for high speed data stream , 2013 .

[3]  Kuen-Fang Jea,et al.  An approach of support approximation to discover frequent patterns from concept-drifting data streams based on concept learning , 2013, Knowledge and Information Systems.

[4]  Guo Gongde Ensemble classification algorithm for high speed data stream , 2012 .

[5]  Qiang Ma,et al.  Real-Time Processing for High Speed Data Stream over Large Scale Data: Real-Time Processing for High Speed Data Stream over Large Scale Data , 2012 .

[6]  Chunming Rong,et al.  Resource Constrained Data Stream Clustering with Concept Drifting for Processing Sensor Data , 2015, Int. J. Data Warehous. Min..

[7]  Simon Fong,et al.  Countering the concept-drift problems in big data by an incrementally optimized stream mining model , 2015, J. Syst. Softw..

[8]  Raymond Y. K. Lau,et al.  Dynamic Clustering Forest: An ensemble framework to efficiently classify textual data stream with concept drift , 2016, Inf. Sci..

[9]  Marin Litoiu,et al.  Distributed, application-level monitoring for heterogeneous clouds using stream processing , 2013, Future Gener. Comput. Syst..

[10]  Dong Hyun Jeong,et al.  An integrated framework for managing sensor data uncertainty using cloud computing , 2013, Inf. Syst..

[11]  Zhihai Wang,et al.  Online Ensemble Using Adaptive Windowing for Data Streams with Concept Drift , 2016, Int. J. Distributed Sens. Networks.

[12]  Alexei Pozdnoukhov,et al.  Enabling real-time city sensing with kernel stream oracles and MapReduce , 2013, Pervasive Mob. Comput..

[13]  Roberto Giachetta,et al.  A framework for processing large scale geospatial and remote sensing data in MapReduce environment , 2015, Comput. Graph..

[14]  Mehmed M. Kantardzic,et al.  A grid density based framework for classifying streaming data in the presence of concept drift , 2015, Journal of Intelligent Information Systems.