Outlier detection and quasi-periodicity optimization algorithm: Frequency domain based outlier detection (FOD)

Abstract Outlier detection is one of the main challenges in the pre-processing stage of data analyses. In this study, we suggest a new non-parametric outlier detection technique which is based on the frequency-domain and Fourier Transform definitions and call it as the frequency-domain based outlier detection (FOD). From simulation results under various distributions and real data applications, we observe that our proposal approach is capable of detecting quasi-periodic outliers in time series data more successfully compared with other commonly used methods like z-score, box-plot and also faster than some specialized methods Grubbs method and autonomous anomaly detection (AAD) method. Therefore, we consider that our proposal approach can be an alternative approach to find quasi-periodic outliers in time series data.

[1]  F. E. Grubbs Sample Criteria for Testing Outlying Observations , 1950 .

[2]  Y. Loo,et al.  Effect of climate change on seasonal monsoon in Asia and its impact on the variability of monsoon rainfall in Southeast Asia , 2015 .

[3]  F. E. Grubbs Procedures for Detecting Outlying Observations in Samples , 1969 .

[4]  Harry Nyquist Certain Topics in Telegraph Transmission Theory , 1928 .

[5]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[6]  Jingke Xi,et al.  Outlier Detection Algorithms in Data Mining , 2008, 2008 Second International Symposium on Intelligent Information Technology Application.

[7]  Laurence Anthony F. Park,et al.  Approximate Document Outlier Detection Using Random Spectral Projection , 2012, Australasian Conference on Artificial Intelligence.

[8]  Walid G. Aref,et al.  Periodicity detection in time series databases , 2005, IEEE Transactions on Knowledge and Data Engineering.

[9]  Reda Alhajj,et al.  Fourier Transform Based Spatial Outlier Mining , 2009, IDEAL.

[10]  Stefano Cabras,et al.  Extreme value analysis within a parametric outlier detection framework , 2006 .

[11]  Christos Faloutsos,et al.  Fast and reliable anomaly detection in categorical data , 2012, CIKM.

[12]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[13]  Mohd. Noor Md. Sap,et al.  Outlier Detection Technique in Data Mining: A Research Perspective , 2005 .

[14]  Roy A. Maxion,et al.  Comparing anomaly-detection algorithms for keystroke dynamics , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[15]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[16]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[17]  Ronald K. Pearson,et al.  BMC Bioinformatics BioMed Central Methodology article , 2005 .

[18]  Jianqing Li,et al.  An Adaptive and Time-Efficient ECG R-Peak Detection Algorithm , 2017, Journal of healthcare engineering.

[19]  Xiaowei Gu,et al.  Applications of Autonomous Anomaly Detection , 2019 .

[20]  Eamonn J. Keogh,et al.  Finding the most unusual time series subsequence: algorithms and applications , 2006, Knowledge and Information Systems.

[21]  Lian-kui Dai,et al.  Partial least squares with outlier detection in spectral analysis: A tool to predict gasoline properties , 2009 .

[22]  Zengyou He,et al.  An Optimization Model for Outlier Detection in Categorical Data , 2005, ICIC.

[23]  Yufei Tao,et al.  Mining distance-based outliers from large databases in any metric space , 2006, KDD '06.

[24]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[25]  Robert J. Vanderbei,et al.  Fast Fourier optimization , 2012, Math. Program. Comput..

[26]  Charu C. Aggarwal,et al.  Outlier Ensembles - An Introduction , 2017 .

[27]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[28]  Guoming Tang,et al.  From Landscape to Portrait: A New Approach for Outlier Detection in Load Curve Data , 2014, IEEE Transactions on Smart Grid.

[29]  Martina Čampulová Comparison of Methods for Smoothing Environmental Data with an Application to Particulate Matter PM10 , 2018 .

[30]  Jessica Lin,et al.  HOT SAX: Finding the Most Unusual Time Series Subsequence: Algorithms and Applications , 2004 .

[31]  R. Bracewell The Fourier Transform and Its Applications , 1966 .

[32]  Mohammed Al-Shalalfa,et al.  Efficient Periodicity Mining in Time Series Databases Using Suffix Trees , 2011, IEEE Transactions on Knowledge and Data Engineering.

[33]  D. G. Simpson,et al.  Robust principal component analysis for functional data , 2007 .

[34]  D. K. Shangodoyin,et al.  Detection of Outliers in Time Series Data: A Frequency Domain Approach , 2008 .

[35]  Dingsheng Wan,et al.  Time Series Outlier Detection Based on Sliding Window Prediction , 2014 .

[36]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[37]  Kashem M. Muttaqi,et al.  Climate change impacts on electricity demand in the State of New South Wales, Australia , 2012 .

[38]  Martina Čampulová,et al.  Control chart and Six sigma based algorithms for identification of outliers in experimental data, with an application to particulate matter PM10 , 2017 .

[39]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .

[40]  An application of the Fourier transform to optimization of continuous 2-D systems , 2003 .

[41]  Junpeng Bao,et al.  The Outlier Interval Detection Algorithms on Astronautical Time Series Data , 2013 .

[42]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[43]  P. Chiang,et al.  Temperature and nutrients are significant drivers of seasonal shift in phytoplankton community from a drinking water reservoir, subtropical China , 2014, Environmental Science and Pollution Research.

[44]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[45]  Juan Du,et al.  Automatic defect inspection of patterned FPC board based on 1-D fourier reconstruction , 2017, 2017 36th Chinese Control Conference (CCC).

[46]  Christian Böhm,et al.  CoCo: coding cost for parameter-free outlier detection , 2009, KDD.

[47]  C. D. Keeling,et al.  Possible forcing of global temperature by the oceanic tides. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[48]  R Lazzara,et al.  The His‐Purkinje Electrocardiogram in Man: An Initial Assessment of its Uses and Limitations , 1976, Circulation.

[49]  Ashoke Kumar Sarkar,et al.  The 7 th International Conference on Ambient Systems , Networks and Technologies ( ANT 2016 ) Application of Principal Component Analysis for Outlier Detection in Heterogeneous Traffic Data , 2016 .

[50]  Hongxing He,et al.  A comparative study of RNN for outlier detection in data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[51]  Seiichi Uchida,et al.  A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data , 2016, PloS one.

[52]  M. Pooyan,et al.  Providing an Efficient Algorithm for Finding R Peaks in ECG Signals and Detecting Ventricular Abnormalities With Morphological Features , 2016, Journal of medical signals and sensors.

[53]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[54]  Ira Assent,et al.  Local Outlier Detection with Interpretation , 2013, ECML/PKDD.

[55]  Prabhakar Raghavan,et al.  A Linear Method for Deviation Detection in Large Databases , 1996, KDD.

[56]  Nicholas I. Fisher,et al.  Statistical Analysis of Circular Data , 1993 .

[57]  A. Flammini,et al.  Efficient R-peak detection algorithm for real-time analysis of ECG in portable devices , 2016, 2016 IEEE Sensors Applications Symposium (SAS).

[58]  Jie Chen,et al.  Bioinformatics Original Paper Detecting Periodic Patterns in Unevenly Spaced Gene Expression Time Series Using Lomb–scargle Periodograms , 2022 .

[59]  E. Farahabadi,et al.  R Peak Detection in Electrocardiogram Signal Based on an Optimal Combination of Wavelet Transform, Hilbert Transform, and Adaptive Thresholding , 2011, Journal of medical signals and sensors.

[60]  Walid G. Aref,et al.  WARP: time warping for periodicity detection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[61]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[62]  Martina Čampulová,et al.  Semiparametric outlier detection in nonstationary times series: Case study for atmospheric pollution in Brno, Czech Republic , 2018 .

[63]  Daniel G. Sbarbaro-Hofer,et al.  Outliers detection in environmental monitoring databases , 2011, Eng. Appl. Artif. Intell..

[64]  Nirvana Meratnia,et al.  Outlier Detection Techniques for Wireless Sensor Networks: A Survey , 2008, IEEE Communications Surveys & Tutorials.

[65]  J. Hansen,et al.  GLOBAL SURFACE TEMPERATURE CHANGE , 2010 .

[66]  Peter Filzmoser,et al.  Outlier identification in high dimensions , 2008, Comput. Stat. Data Anal..

[67]  Xiaowei Gu,et al.  Anomaly Detection—Empirical Approach , 2018, Empirical Approach to Machine Learning.

[68]  A. Cherkaev Variational Methods for Structural Optimization , 2000 .

[69]  Jilles Vreeken,et al.  The Odd One Out: Identifying and Characterising Anomalies , 2011, SDM.

[70]  Amanda F. Mejia,et al.  PCA leverage: outlier detection for high‐dimensional functional magnetic resonance imaging data , 2015, Biostatistics.

[71]  Douglas L. Jones,et al.  Real-valued fast Fourier transform algorithms , 1987, IEEE Trans. Acoust. Speech Signal Process..

[72]  R. Shiffler Maximum Z Scores and Outliers , 1988 .

[73]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[74]  G. D. Bergland,et al.  A guided tour of the fast Fourier transform , 1969, IEEE Spectrum.

[75]  J. Scheithauer,et al.  Monetary policy implementation and overnight rate persistence , 2011 .

[76]  David Paydarfar,et al.  Noisy inputs and the induction of on-off switching behavior in a neuronal pacemaker. , 2006, Journal of neurophysiology.

[77]  Plamen P. Angelov,et al.  Autonomous anomaly detection , 2017, 2017 Evolving and Adaptive Intelligent Systems (EAIS).

[79]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[80]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[81]  Yude Pan,et al.  Summer solstice marks a seasonal shift in temperature sensitivity of stem growth and nitrogen-use efficiency in cold-limited forests , 2018 .

[82]  Reda Alhajj,et al.  A Framework for Periodic Outlier Pattern Detection in Time-Series Sequences , 2014, IEEE Transactions on Cybernetics.

[83]  Piotr Indyk,et al.  Identifying Representative Trends in Massive Time Series Data Sets Using Sketches , 2000, VLDB.