相关论文

Abstract:Catalogs of periodic variable stars contain large numbers of periodic light-curves (photometric time series data from the astrophysics domain). Separating anomalous objects from well-known classes is an important step towards the discovery of new classes of astronomical objects. Most anomaly detection methods for time series data assume either a single continuous time series or a set of time series whose periods are aligned. Light-curve data precludes the use of these methods as the periods of any given pair of light-curves may be out of sync. One may use an existing anomaly detection method if, prior to similarity calculation, one performs the costly act of aligning two light-curves, an operation that scales poorly to massive data sets. This paper presents PCAD, an unsupervised anomaly detection method for large sets of unsynchronized periodic time-series data, that outputs a ranked list of both global and local anomalies. It calculates its anomaly score for each light-curve in relation to a set of centroids produced by a modified k-means clustering algorithm. Our method is able to scale to large data sets through the use of sampling. We validate our method on both light-curve data and other time series data sets. We demonstrate its effectiveness at finding known anomalies, and discuss the effect of sample size and number of centroids on our results. We compare our method to naive solutions and existing time series anomaly detection methods for unphased data, and show that PCAD’s reported anomalies are comparable to or better than all other methods. Finally, astrophysicists on our team have verified that PCAD finds true anomalies that might be indicative of novel astrophysical phenomena.

参考文献

[1]  A. Madansky Identification of Outliers , 1988 .

[2]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[3]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[4]  Chris Jermaine,et al.  Outlier detection by sampling with accuracy guarantees , 2006, KDD '06.

[5]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[6]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[7]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[8]  R. M. Deeley Variable Stars , 1916, Nature.

[9]  S. Muthukrishnan,et al.  Mining Deviants in a Time Series Database , 1999, VLDB.

[10]  Junshui Ma,et al.  Online novelty detection on temporal sequences , 2003, KDD '03.

[11]  S. Mallat A wavelet tour of signal processing , 1998 .

[12]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[13]  Philip S. Yu,et al.  Mining Surprising Periodic Patterns , 2004, Data Mining and Knowledge Discovery.

[14]  Padhraic Smyth,et al.  Joint Probabilistic Curve Clustering and Alignment , 2004, NIPS.

[15]  Padhraic Smyth,et al.  Translation-invariant mixture models for curve clustering , 2003, KDD '03.

[16]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[17]  A U D A L S K I,et al.  Optical Gravitational Lensing Experiment. Photometry of the Macho-smc-1 Microlensing Candidate. * , 1997 .

[18]  A. Hewish,et al.  Observation of a Rapidly Pulsating Radio Source , 1968, Nature.

[19]  Radford M. Neal,et al.  Bayesian Detection of Infrequent Differences in Sets of Time Series with Shared Structure , 2006, NIPS.

[20]  Tommi S. Jaakkola,et al.  A new approach to analyzing gene expression time series data , 2002, RECOMB '02.

[21]  Philip S. Yu,et al.  Infominer: mining surprising periodic patterns , 2001, KDD '01.

[22]  Yoshua Bengio,et al.  Convergence Properties of the K-Means Algorithms , 1994, NIPS.

[23]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[24]  Li Wei,et al.  Assumption-Free Anomaly Detection in Time Series , 2005, SSDBM.

[25]  Ray W. Klebesadel,et al.  Observations of Gamma-Ray Bursts of Cosmic Origin , 1973 .

[26]  Philip K. Chan,et al.  Modeling multiple time series for anomaly detection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[27]  E. N. Pastukhova,et al.  An electronic version of the second volume of the General Catalogue of Variable Stars with improved coordinates , 2006 .

[28]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[29]  Philip K. Chan,et al.  Trajectory boundary modeling of time series for anomaly detection , 2005 .

[30]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[31]  Eamonn J. Keogh,et al.  UCR Time Series Data Mining Archive , 1983 .

[32]  D. L. Pollacco,et al.  New Light on UU Sagittae , 1993 .

[33]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[34]  Dipankar Dasgupta,et al.  Novelty detection in time series data using ideas from immunology , 1996 .

[35]  M. Schmidt,et al.  3C 273 : A Star-Like Object with Large Red-Shift , 1963, Nature.

[36]  Cyrus Shahabi,et al.  TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[37]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[38]  P. Protopapas,et al.  Finding outlier light curves in catalogues of periodic variable stars , 2005, astro-ph/0505495.

[39]  Aidong Zhang,et al.  FindOut: Finding Outliers in Very Large Datasets , 2002, Knowledge and Information Systems.

[40]  Li Wei,et al.  SAXually Explicit Images: Finding Unusual Shapes , 2006, Sixth International Conference on Data Mining (ICDM'06).

[41]  C. Sterken,et al.  Light Curves of Variable Stars, A Pictorial Atlas , 1996 .

[42]  Anthony K. H. Tung,et al.  Mining top-n local outliers in large databases , 2001, KDD '01.

[43]  Stefan Berchtold,et al.  Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets , 2003, IEEE Trans. Knowl. Data Eng..

[44]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[45]  Philip Chan,et al.  Learning States and Rules for Time Series Anomaly Detection , 2004, FLAIRS.

[46]  William Perrizo,et al.  RDF: a density-based outlier detection method using vertical data representation , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

引用
An Outlier Detection Method to Improve Gathered Datasets for Network Behavior Analysis in IoT
J. Commun.
2019
On the effect of endpoints on dynamic time warping
2016
Prefix and Suffix Invariant Dynamic Time Warping
2016 IEEE 16th International Conference on Data Mining (ICDM)
2016
Classification and Anomaly Detection for Astronomical Survey Data
2013
Anomaly detection in the Zwicky Transient Facility DR3
Monthly Notices of the Royal Astronomical Society
2020
Forecasting Ability of a Periodic Component Extracted from Large‐Cap Index Time Series
2017
Time series anomaly detection based on shapelet learning
Comput. Stat.
2018
Time Series Join on Subsequence Correlation
2014 IEEE International Conference on Data Mining
2014
Multi-sensor event detection using shape histograms
CODS
2014
Clustering of large time-series datasets using a multi-step approach / Saeed Reza Aghabozorgi Sahaf Yazdi
2013
Anomaly detection of time series.
2010
Parameter-Free Search of Time-Series Discord
Journal of Computer Science and Technology
2013
Computational Intelligence Challenges and Applications on Large-Scale Astronomical Time Series Databases
IEEE Computational Intelligence Magazine
2014
A New Method for Outlier Detection on Time Series Data
2015
Clustering Unsynchronized Time Series Subsequences with Phase Shift Weighted Spherical k-means Algorithm
J. Comput.
2014
Recovery of Missing Values using Matrix Decomposition Techniques
2015
Explainable Time Series Tweaking via Irreversible and Reversible Temporal Transformations
2018 IEEE International Conference on Data Mining (ICDM)
2018
Locally and globally explainable time series tweaking
Knowledge and Information Systems
2019
Classification and anomaly detection for astronomical datasets
2012
Anomaly detection for symbolic sequences and time series data
2009