1 SensoClean : Handling Noisy and Incomplete Data in Sensor Networks using Modeling

Sensor networks have shown tremendous growth in many domains such as environmental monitoring. The data captured from the physical world through these sensor devices, however, tend to be incomplete, noisy, and unreliable. Traditional data cleaning techniques cannot be applied to such data as they do not take into account the strong spatial and temporal correlations typically present in sensor data. Popular data modeling methods like Kalman filters and regression have shown good results in capturing spatio-temporal correlations. We implemented these methods in an extensible toolkit with graphical visualization, and explored their effectiveness in cleaning sensor data. We obtained good data cleaning results in our experiments using Kalman filters. Regression with a high-order polynomial also showed promising results, but worked poorly for data with high variability.

[1]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[2]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[3]  C. Guestrin,et al.  Distributed regression: an efficient framework for modeling sensor network data , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[4]  B. R. Badrinath,et al.  Cleaning and querying noisy sensors , 2003, WSNA '03.

[5]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[6]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[7]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[8]  Gustavo Alonso,et al.  A Pipelined Framework for Online Cleaning of Sensor Data Streams , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[9]  Dennis Shasha,et al.  Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.

[10]  J. L. Roux An Introduction to the Kalman Filter , 2003 .