An Efficient Data Streams Mining Method for Wireless Sensor Network's Data Aggregation

Wireless distributed sensor systems will enable the reliable monitoring of a variety of environments for both civil and military applications. The data model generated by sensor network is data streams. Because of the rapid data arriving speed and huge size of data set in stream model, novel one-pass algorithms are devised to support data aggregation on demand. In this paper, we focus on data aggregation, which can have significant impact on sensor network. VFDT is one of the most successful algorithms for data streams mining, which uses Hoeffding inequality to achieve a probabilistic bound on the accuracy of the tree constructed; we revisit this problem and propose an efficient algorithm for handling wireless sensor network’s streaming data. In order to examine this algorithm, we study its performance with different data noise level, number of sensor network nodes and number of data. Overall, the techniques introduced here can handle wireless sensor network’s data efficiently.

[1]  Ian F. Akyildiz,et al.  Wireless sensor networks: a survey , 2002, Comput. Networks.

[2]  Deborah Estrin,et al.  Guest Editors' Introduction: Overview of Sensor Networks , 2004, Computer.

[3]  Michaela M. Black,et al.  Maintaining the performance of a learned classifier under concept drift , 1999, Intell. Data Anal..

[4]  Jennifer Widom,et al.  STREAM: the stanford stream data manager (demonstration description) , 2003, SIGMOD '03.

[5]  Johannes Gehrke,et al.  BOAT—optimistic decision tree construction , 1999, SIGMOD '99.

[6]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[7]  Mani Srivastava,et al.  Overview of sensor networks , 2004 .

[8]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[9]  Ruoming Jin,et al.  Efficient decision tree construction on streaming data , 2003, KDD '03.

[10]  Steffen Hölldobler,et al.  Incremental Fuzzy Decision Trees , 2002, KI.

[12]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[13]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[14]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[15]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[16]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[17]  João Gama,et al.  Accurate decision trees for mining high-speed data streams , 2003, KDD '03.

[18]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[19]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[20]  Wei Fan StreamMiner: A Classifier Ensemble-based Engine to Mine Concept-drifting Data Streams , 2004, VLDB.

[21]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[22]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[23]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[24]  Oded Maimon Knowledge Discovery and Data Mining : The Info-Fuzzy Network (IFN) Methodology , 2000 .

[25]  Cezary Z. Janikow,et al.  Fuzzy decision trees: issues and methods , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[26]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[27]  LastMark Online classification of nonstationary data streams , 2002 .

[28]  Mark Last,et al.  Online classification of nonstationary data streams , 2002, Intell. Data Anal..

[29]  Niall M. Adams,et al.  The impact of changing populations on classifier performance , 1999, KDD '99.

[30]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[31]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[32]  Quanyuan Wu,et al.  Mining Concept-Drifting and Noisy Data Streams Using Ensemble Classifiers , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[33]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[34]  Leonidas J. Guibas,et al.  Sparse Data Aggregation in Sensor Networks , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[35]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[36]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..