Parameter-Free Search of Time-Series Discord

Time-series discord is widely used in data mining applications to characterize anomalous subsequences in time series. Compared to some other discord search algorithms, the direct search algorithm based on the recurrence plot shows the advantage of being fast and parameter free. The direct search algorithm, however, relies on quasi-periodicity in input time series, an assumption that limits the algorithm's applicability. In this paper, we eliminate the periodicity assumption from the direct search algorithm by proposing a reference function for subsequences and a new sampling strategy based on the reference function. These measures result in a new algorithm with improved efficiency and robustness, as evidenced by our empirical evaluation.

[1]  Ali H. Shoeb,et al.  Application of Machine Learning To Epileptic Seizure Detection , 2010, ICML.

[2]  Dipankar Dasgupta,et al.  Novelty detection in time series data using ideas from immunology , 1996 .

[3]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[4]  Eamonn J. Keogh,et al.  Approximations to magic: finding unusual medical time series , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[5]  Wenceslao González-Manteiga,et al.  A functional analysis of NOx levels: location and scale estimation and outlier detection , 2007, Comput. Stat..

[6]  Ali H. Shoeb,et al.  Patient-specific seizure onset detection , 2004, Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[7]  Jürgen Kurths,et al.  Recurrence plots for the analysis of complex systems , 2009 .

[8]  M. Febrero,et al.  Outlier detection in functional data by depth measures, with application to identify abnormal NOx levels , 2008 .

[9]  Pavlos Protopapas,et al.  Finding anomalous periodic time series , 2009, Machine Learning.

[10]  Eamonn J. Keogh,et al.  Disk aware discord discovery: finding unusual time series in terabyte sized datasets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[11]  Tran Khanh Dang,et al.  HOT aSAX: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery , 2010, ACIIDS.

[12]  Jian Pei,et al.  WAT: Finding Top-K Discords in Time Series Database , 2007, SDM.

[13]  Duong Tuan Anh,et al.  EWAT+: Finding Time Series Discords Based on New Discord Measure Functions , 2010, 2010 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF).

[14]  Rob J. Hyndman,et al.  Robust forecasting of mortality and fertility rates: A functional data approach , 2007, Comput. Stat. Data Anal..

[15]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[16]  Wei Luo,et al.  Faster and Parameter-Free Discord Search in Quasi-Periodic Time Series , 2011, PAKDD.

[17]  Charles L. Webber,et al.  Recurrence Quantification Analysis , 2015 .

[18]  E. Bradley,et al.  Recurrence plots of experimental data: To embed or not to embed? , 1998, Chaos.

[19]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[20]  Jarke J. van Wijk,et al.  Cluster and Calendar Based Visualization of Time Series Data , 1999, INFOVIS.

[21]  D. Ruelle,et al.  Recurrence Plots of Dynamical Systems , 1987 .

[22]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[23]  Norbert Marwan,et al.  A historical review of recurrence plots , 2008, 1709.09971.

[24]  Eamonn J. Keogh,et al.  Finding Time Series Discords Based on Haar Transform , 2006, ADMA.

[25]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[26]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .