A Novel Machine Learning Approach Toward Quality Assessment of Sensor Data

A novel machine learning approach to assess the quality of sensor data using an ensemble classification framework is presented in this paper. The quality of sensor data is indicated by discrete quality flags that indicate the level of uncertainty associated with a sensor reading. Depending on the domain and the problem under consideration, the level of uncertainty is different and thus unsupervised methods like outlier detection fails to match the expectation. The quality flags are normally assigned by domain experts. Considering the volume of sensor data, manual assignment is a laborious task and subject to human error. Given a representative set of labelled data, a supervised classification approach is thus a feasible alternative. The nature of sensor data, however, poses some challenges to the classification task. Data of dubious quality exists in such data sets with very small frequency leading to the class imbalance problem. We thus adopt a cluster oriented sampling approach to address the imbalance issue. In addition, it is beneficial to train multiple classifiers to improve the overall classification accuracy. We thus produce multiple under-sampled training sets using cluster oriented sampling and train base classifiers on each of them. Decisions produced by the base classifiers are fused into a single decision using majority voting. We have evaluated the proposed ensemble classification framework by assessing the quality of marine sensor data obtained from sensors situated at Sullivans Cove, Hobart, Australia. Experimental results reveal that the proposed framework agrees with expert judgement with high accuracy and achieves superior classification performance than other state-of-the-art approaches.

[1]  N. P. Fofonoff,et al.  Algorithms for Computation of Fundamental Properties of Seawater. Endorsed by Unesco/SCOR/ICES/IAPSO Joint Panel on Oceanographic Tables and Standards and SCOR Working Group 51. Unesco Technical Papers in Marine Science, No. 44. , 1983 .

[2]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[5]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[6]  F. Benvenuto,et al.  NEURAL NETWORKS FOR ENVIRONMENTAL PROBLEMS: DATA QUALITY CONTROL AND AIR POLLUTION NOWCASTING , 2000 .

[7]  Alessandro Marani,et al.  Neural Networks for Data Quality Control and Air Pollution Nowcasting , 2001 .

[8]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[9]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[10]  Dimitrios Gunopulos,et al.  Distributed deviation detection in sensor networks , 2003, SGMD.

[11]  B. R. Badrinath,et al.  Context-Aware Sensors , 2004, EWSN.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[15]  C.-C. Jay Kuo,et al.  Distributed spatio-temporal outlier detection in sensor networks , 2005, SPIE Defense + Commercial Sensing.

[16]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[17]  Deborah Estrin,et al.  Rapid Deployment with Confidence: Calibration and Fault Detection in Environmental Sensor Networks , 2006 .

[18]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[19]  Marimuthu Palaniswami,et al.  Quarter Sphere Based Distributed Anomaly Detection in Wireless Sensor Networks , 2007, 2007 IEEE International Conference on Communications.

[20]  Aric A. Hagberg,et al.  Separating the Wheat from the Chaff: Practical Anomaly Detection Schemes in Ecological Applications of Distributed Sensor Networks , 2007, DCOSS.

[21]  Chia Chuen Kao,et al.  Data quality check procedures of an operational coastal ocean monitoring network , 2007 .

[22]  J.V. Koziana,et al.  Automated data quality assurance for marine observations , 2008, OCEANS 2008.

[23]  Oliver Obst,et al.  Using Echo State Networks for Anomaly Detection in Underground Coal Mines , 2008, 2008 International Conference on Information Processing in Sensor Networks (ipsn 2008).

[24]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[25]  B. Howell,et al.  The Tasmanian Marine Analysis Network (TasMAN) , 2009, OCEANS 2009-EUROPE.

[26]  W. Michalowski,et al.  Dealing with Severely Imbalanced Data , 2009 .

[27]  Ramesh Govindan,et al.  Sensor faults: Detection methods and prevalence in real-world datasets , 2010, TOSN.

[28]  Deborah Estrin,et al.  Heartbeat of a nest: Using imagers as biological sensors , 2010, TOSN.

[29]  Yuan Yao,et al.  Online anomaly detection for sensor systems: A simple and efficient approach , 2010, Perform. Evaluation.

[30]  David J. Hill,et al.  Anomaly detection in streaming environmental sensor data: A data-driven modeling approach , 2010, Environ. Model. Softw..

[31]  Ashfaqur Rahman,et al.  A novel ensemble classifier approach using weak classifier learning on overlapping clusters , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[32]  Leon Reznik,et al.  Automated Data Quality Assessment of Marine Sensors , 2011, Sensors.

[33]  Claire D'Este,et al.  Low-cost marine monitoring: From sensors to information delivery , 2011, OCEANS'11 MTS/IEEE KONA.

[34]  Daniel Smith,et al.  A quality control framework for marine sensing using statistical, causal inference , 2011, OCEANS'11 MTS/IEEE KONA.

[35]  Alia Ghaddar,et al.  Algorithm for temporal anomaly detection in WSNs , 2011, 2011 IEEE Wireless Communications and Networking Conference.

[36]  Ashfaqur Rahman,et al.  Novel Layered Clustering-Based Approach for Generating Ensemble of Classifiers , 2011, IEEE Transactions on Neural Networks.

[37]  B. Howell,et al.  Quantitative Quality Control (QC) procedures for the Australian National reference stations: Sensor data , 2011, OCEANS'11 MTS/IEEE KONA.