Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping
Abstract:Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.
暂无分享,去 创建一个
[1] David Goldberg. What Every Computer Scientist Should Know About Floating-Point Arithmetic , 1992 .
[2] Christos Faloutsos,et al. Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.
[3] Wesley W. Chu,et al. An index-based approach for similarity search supporting time warping in large sequence databases , 2001, Proceedings 17th International Conference on Data Engineering.
[4] Mika P. Tarvainen,et al. High-Resolution QRS Detection Algorithm for Sparsely Sampled ECG Recordings , 2004 .
[5] Ambuj K. Singh,et al. Optimizing similarity search for arbitrary length time series queries , 2004, IEEE Transactions on Knowledge and Data Engineering.
[6] Eamonn J. Keogh,et al. On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.
[7] Gregory H. Wakefield,et al. Iterative Deepening for Melody Alignment and Retrieval , 2005, ISMIR.
[8] Christos Faloutsos,et al. FTW: fast similarity search under the time warping distance , 2005, PODS.
[9] S. Venkatesh,et al. Online Context Recognition in Multisensor Systems using Dynamic Time Warping , 2005, 2005 International Conference on Intelligent Sensors, Sensor Networks and Information Processing.
[10] A. Zinke,et al. Iterative Multi Scale Dynamic Time Warping , 2006 .
[11] Eamonn J. Keogh,et al. Scaling and time warping in time series querying , 2005, The VLDB Journal.
[12] Sang-Wook Kim,et al. Using multiple indexes for efficient subsequence matching in time-series databases , 2006, Inf. Sci..
[13] Gerhard Tröster,et al. Gestures are strings: efficient online gesture spotting and classification using string matching , 2007, BODYNETS.
[14] Alicia Fornés,et al. Old Handwritten Musical Symbol Classification by a Dynamic Time Warping Based Method , 2008, GREC.
[15] Christos Faloutsos,et al. Stream Monitoring under the Time Warping Distance , 2007, 2007 IEEE 23rd International Conference on Data Engineering.
[16] Yang Li,et al. Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes , 2007, UIST.
[17] William B. S. Pressly. TSPad: a Tablet-PC based application for annotation and collaboration on time series data , 2008, ACM-SE 46.
[18] Pavlos Protopapas,et al. Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures , 2008, The VLDB Journal.
[19] Ira Assent,et al. The TS-tree: efficient time series search and retrieval , 2008, EDBT '08.
[20] Pavlos Protopapas,et al. Finding anomalous periodic time series , 2009, Machine Learning.
[21] Hui Ding,et al. Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..
[22] Eamonn J. Keogh,et al. iSAX: indexing and mining terabyte sized time series , 2008, KDD.
[23] Gang Chen,et al. Efficient Processing of Warping Time Series Join of Motion Capture Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.
[24] Stan Sclaroff,et al. A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[25] Eamonn J. Keogh,et al. Time series shapelets: a new primitive for data mining , 2009, KDD.
[26] Meinard Müller,et al. Analysis and Retrieval Techniques for Motion and Music Data , 2009, Eurographics.
[27] Bernt Schiele,et al. Enabling Efficient Time Series Analysis for Wearable Activity Data , 2009, 2009 International Conference on Machine Learning and Applications.
[28] Martin Kampel,et al. Identification of ancient coins based on fusion of shape and local features , 2011, Machine Vision and Applications.
[29] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[30] Eamonn J. Keogh,et al. Online discovery and maintenance of time series motifs , 2010, KDD.
[31] Eamonn J. Keogh,et al. A disk-aware algorithm for time series motif discovery , 2011, Data Mining and Knowledge Discovery.
[32] Tele Tan,et al. Classifying eye and head movement artifacts in EEG signals , 2011, 5th IEEE International Conference on Digital Ecosystems and Technologies (IEEE DEST 2011).
[33] Deep Bera,et al. Cardiac arrhythmia detection using dynamic time warping of ECG beats in e-healthcare systems , 2011, 2011 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks.
[34] Arvind Kumar,et al. Implementing the dynamic time warping algorithm in multithreaded environments for real time and unsupervised pattern discovery , 2011, 2011 2nd International Conference on Computer and Communication Technology (ICCCT-2011).
[35] M. Sile O'Modhrain,et al. Recognition Of Multivariate Temporal Musical Gestures Using N-Dimensional Dynamic Time Warping , 2011, NIME.
[36] Albert J. Vilella,et al. Comparative and demographic analysis of orang-utan genomes , 2011, Nature.
[37] Dimitrios Gunopulos,et al. Embedding-based subsequence matching in time-series databases , 2011, TODS.
[38] James R. Glass,et al. An inner-product lower-bound estimate for dynamic time warping , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).