Learning from class-imbalanced data in wireless sensor networks

In this paper, we study wireless sensor networks used for detection of rare events (e.g. intrusion). The task of the sensor node is to collect data points (examples) at regular time intervals and communicate them to the central base station (BS) using wireless links. Since sensor nodes have limited battery power, it is necessary to minimize their energy consumption. One way is to reduce the amount of sensor data packets transmitted. In this paper, we incorporate machine learning strategies to intelligently reduce the amount of transmitted data, in order to increase life-span of the sensors and thus profitability of the system. In our proposed approach, after a short initialization period, the sensors obtain a classification model from the BS based upon which they detect interesting (positive) data points. Positive examples are, together with selected negative examples, then reported to the BS. In time, BS would have stored an abundant number of negatives and a limited number of positives causing what is termed as a class-imbalance problem in learning. In order to understand the impact of network architecture on learning performance, two different architectures are studied: cluster-based (LEACH) and tiered (UNPF). With the aid of experiments using generated data sets, the paper analyzes the tradeoffs between prediction success, learning cost, packets transmitted and energy consumed. The results show that the proposed learning mechanism significantly reduces energy consumption compared to the baseline system.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Chien-Chung Shen,et al.  Sensor information networking architecture and applications , 2001, IEEE Wirel. Commun..

[3]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[4]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[5]  John Anderson,et al.  Wireless sensor networks for habitat monitoring , 2002, WSNA '02.

[6]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[7]  Krishna M. Sivalingam,et al.  Data Gathering Algorithms in Sensor Networks Using Energy Metrics , 2002, IEEE Trans. Parallel Distributed Syst..

[8]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[9]  Zoran Obradovic,et al.  Detection of Underrepresented Biological Sequences using Class-Conditional Distribution Models , 2003, SDM.

[10]  Wendi B. Heinzelman,et al.  Application-specific protocol architectures for wireless networks , 2000 .

[11]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[12]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[13]  Gregory J. Pottie,et al.  Wireless integrated network sensors , 2000, Commun. ACM.

[14]  Krishna M. Sivalingam,et al.  A multi-layered architecture and protocols for large-scale wireless sensor networks , 2003, 2003 IEEE 58th Vehicular Technology Conference. VTC 2003-Fall (IEEE Cat. No.03CH37484).

[15]  Ian F. Akyildiz,et al.  Sensor Networks , 2002, Encyclopedia of GIS.

[16]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[17]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[18]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.