Detecting Outliers in Streaming Time Series Data from ARM Distributed Sensors

The Atmospheric Radiation Measurement (ARM) Data Center at ORNL collects data from a number of permanent and mobile facilities around the globe. The data is then ingested to create high level scientific products. High frequency streaming measurements from sensors and radar instruments at ARM sites require high degree of accuracy to enable rigorous study of atmospheric processes. Outliers in collected data are common due to instrument failure or extreme weather events. Thus, it is critical to identify and flag them. We employed multiple univariate, multivariate and time series techniques for outlier detection methods and studied their effectiveness. First, we examined Pearson correlation coefficient which is used to measure the pairwise correlations between variables. Singular Spectrum Analysis (SSA) was applied to detect outliers by removing the anticipated annual and seasonal cycles from the signal to accentuate anomalies. K-means was applied for multivariate examination of data from collection of sensors to identify any deviation from expected and known patterns and identify abnormal observation. The Pearson correlation coefficient, SSA and K-means methods were later combined together in a framework to detect outliers through a range of checks. We applied the developed method to data from meteorological sensors at ARM Southern Great Plains site and validated against existing database of known data quality issues.

[1]  Allen Kent,et al.  Machine literature searching X. Machine language; factors underlying its design and development , 1955 .

[2]  Ted S. Cress,et al.  Deploying the ARM Sites and Supporting Infrastructure , 2016 .

[3]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[4]  Abhishek Sharma,et al.  Context-Aware Time Series Anomaly Detection for Complex Systems , 2013 .

[5]  M. Otto,et al.  Outliers in Time Series , 1972 .

[6]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[7]  F. Pukelsheim The Three Sigma Rule , 1994 .

[8]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[9]  W. Drosdowsky,et al.  An analysis of Australian seasonal rainfall anomalies: 1950–1987. II: Temporal variability and teleconnection patterns , 1993 .

[10]  W. Hargrove,et al.  Using Clustered Climate Regimes to Analyze and Compare Predictions from Fully Coupled General Circulation Models , 2005 .

[11]  S. Schwartz,et al.  The Atmospheric Radiation Measurement (ARM) Program: Programmatic Background and Design of the Cloud and Radiation Test Bed , 1994 .

[12]  Ma Xiujun,et al.  Detecting spatio-temporal outliers in climate dataset: a method study , 2005, Proceedings. 2005 IEEE International Geoscience and Remote Sensing Symposium, 2005. IGARSS '05..

[13]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[14]  Fabio A. González,et al.  Anomaly Detection Using Real-Valued Negative Selection , 2003, Genetic Programming and Evolvable Machines.

[15]  Nina Golyandina,et al.  Basic Singular Spectrum Analysis and forecasting with R , 2012, Comput. Stat. Data Anal..

[16]  Mohammed J. Zaki,et al.  ADMIT: anomaly-based data mining for intrusions , 2002, KDD.

[17]  Anatoly A. Zhigljavsky,et al.  Singular Spectrum Analysis for Time Series , 2013, International Encyclopedia of Statistical Science.

[18]  Theodore Alexandrov,et al.  A METHOD OF TREND EXTRACTION USING SINGULAR SPECTRUM ANALYSIS , 2008, 0804.3367.

[19]  W. Hargrove,et al.  Potential of Multivariate Quantitative Methods for Delineation and Visualization of Ecoregions , 2004, Environmental management.

[20]  Derya Birant,et al.  Spatio-temporal outlier detection in large databases , 2006, 28th International Conference on Information Technology Interfaces, 2006..

[21]  Rahim Mahmoudvand,et al.  The benefits of multivariate singular spectrum analysis over the univariate version , 2018, J. Frankl. Inst..

[22]  David J. Hill,et al.  Anomaly detection in streaming environmental sensor data: A data-driven modeling approach , 2010, Environ. Model. Softw..

[23]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[24]  Eyal Amir,et al.  Real-time Bayesian Anomaly Detection for Environmental Sensor Data , 2007 .

[25]  Vipin Kumar,et al.  Comparative Evaluation of Anomaly Detection Techniques for Sequence Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[26]  Russ Rew,et al.  NetCDF: an interface for scientific data access , 1990, IEEE Computer Graphics and Applications.

[27]  Ashok N. Srivastava,et al.  Anomaly Detection and Diagnosis Algorithms for Discrete Symbol Sequences with Applications to Airline Safety , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[28]  Justin W. Monroe,et al.  The ARM Data Quality Program , 2016 .

[29]  Deepthi Cheboli,et al.  Anomaly detection of time series. , 2010 .

[30]  Sanjay Chawla,et al.  Spatio-temporal Outlier Detection in Precipitation Data , 2008, KDD Workshop on Knowledge Discovery from Sensor Data.

[31]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[32]  Enrico Bozzo,et al.  Relationship between Singular Spectrum Analysis and Fourier analysis: Theory and application to the monitoring of volcanic activity , 2010, Comput. Math. Appl..

[33]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[34]  Raymond A. McCord,et al.  The ARM Data System and Archive , 2016 .

[35]  Krista Gaustad,et al.  A scientific data processing framework for time series NetCDF data , 2014, Environ. Model. Softw..