Outlier detection of air temperature series data using probabilistic finite state automata-based algorithm

This article proposes a probability finite state automata-based algorithm (PFSAA) for detecting outliers of air temperature series data caused by sensor errors. This algorithm first divides the training samples of air temperature series data into subclusters that will be further used to build finite state automata by splitting and combining techniques. Then, it creates a dynamic transition matrix of PFSA based on probability theories. Finally, the outliers of the remaining test samples are detected by PFSAA. The proposed algorithm is quantitatively validated by the reference data and a traditional backpropagation neural net model. © 2012 Wiley Periodicals, Inc. Complexity, 2012 © 2012 Wiley Periodicals, Inc.

[1]  Ji Wu,et al.  Towards integrated and efficient scientific sensor data processing: a database approach , 2009, EDBT '09.

[2]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[3]  G S Cembrowski,et al.  Assessment of "Average of Normals" quality control procedures and guidelines for implementation. , 1984, American journal of clinical pathology.

[4]  Renaud J. Di Francesco Real-time speech segmentation using pitch and convexity jump models: application to variable rate speech coding , 1990, IEEE Trans. Acoust. Speech Signal Process..

[5]  A. Khatkhate,et al.  Symbolic time-series analysis for anomaly detection in mechanical systems , 2006, IEEE/ASME Transactions on Mechatronics.

[6]  Asok Ray,et al.  Symbolic dynamic analysis of complex systems for anomaly detection , 2004, Signal Process..

[7]  Philip Chan,et al.  Learning States and Rules for Detecting Anomalies in Time Series , 2005, Applied Intelligence.

[8]  Mark A. Sturza,et al.  Navigation System Integrity Monitoring Using Redundant Measurements , 1988 .

[9]  Kjell Jørgen Hole,et al.  Adaptive multidimensional coded modulation over flat fading channels , 2000, IEEE Journal on Selected Areas in Communications.

[10]  Robert K. Cunningham,et al.  IREP++, A Faster Rule Learning Algorithm , 2004, SDM.

[11]  Yue Zhao,et al.  A Neighborhood-Based Clustering Algorithm , 2005, PAKDD.

[12]  David M. Lucantoni,et al.  Modeling multiple IP traffic streams with rate limits , 2001 .

[13]  Lei Qin,et al.  A Novel BP Neural Network Model for Traffic Prediction of Next Generation Network , 2009, ICNC.

[14]  Asok Ray,et al.  Symbolic time series analysis via wavelet-based partitioning , 2006, Signal Process..

[15]  Arthur Gretton,et al.  An online support vector machine for abnormal events detection , 2006, Signal Process..

[16]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[17]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..