相关论文

Abstract:Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higher-level time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.

参考文献

[1]  David Goldberg What Every Computer Scientist Should Know About Floating-Point Arithmetic , 1992 .

[2]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[3]  Wesley W. Chu,et al.  An index-based approach for similarity search supporting time warping in large sequence databases , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  Mika P. Tarvainen,et al.  High-Resolution QRS Detection Algorithm for Sparsely Sampled ECG Recordings , 2004 .

[5]  Ambuj K. Singh,et al.  Optimizing similarity search for arbitrary length time series queries , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[7]  Gregory H. Wakefield,et al.  Iterative Deepening for Melody Alignment and Retrieval , 2005, ISMIR.

[8]  Christos Faloutsos,et al.  FTW: fast similarity search under the time warping distance , 2005, PODS.

[9]  S. Venkatesh,et al.  Online Context Recognition in Multisensor Systems using Dynamic Time Warping , 2005, 2005 International Conference on Intelligent Sensors, Sensor Networks and Information Processing.

[10]  A. Zinke,et al.  Iterative Multi Scale Dynamic Time Warping , 2006 .

[11]  Eamonn J. Keogh,et al.  Scaling and time warping in time series querying , 2005, The VLDB Journal.

[12]  Sang-Wook Kim,et al.  Using multiple indexes for efficient subsequence matching in time-series databases , 2006, Inf. Sci..

[13]  Gerhard Tröster,et al.  Gestures are strings: efficient online gesture spotting and classification using string matching , 2007, BODYNETS.

[14]  Alicia Fornés,et al.  Old Handwritten Musical Symbol Classification by a Dynamic Time Warping Based Method , 2008, GREC.

[15]  Christos Faloutsos,et al.  Stream Monitoring under the Time Warping Distance , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[16]  Yang Li,et al.  Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes , 2007, UIST.

[17]  William B. S. Pressly TSPad: a Tablet-PC based application for annotation and collaboration on time series data , 2008, ACM-SE 46.

[18]  Pavlos Protopapas,et al.  Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures , 2008, The VLDB Journal.

[19]  Ira Assent,et al.  The TS-tree: efficient time series search and retrieval , 2008, EDBT '08.

[20]  Pavlos Protopapas,et al.  Finding anomalous periodic time series , 2009, Machine Learning.

[21]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[22]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[23]  Gang Chen,et al.  Efficient Processing of Warping Time Series Join of Motion Capture Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[24]  Stan Sclaroff,et al.  A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[26]  Meinard Müller,et al.  Analysis and Retrieval Techniques for Motion and Music Data , 2009, Eurographics.

[27]  Bernt Schiele,et al.  Enabling Efficient Time Series Analysis for Wearable Activity Data , 2009, 2009 International Conference on Machine Learning and Applications.

[28]  Martin Kampel,et al.  Identification of ancient coins based on fusion of shape and local features , 2011, Machine Vision and Applications.

[29]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Eamonn J. Keogh,et al.  Online discovery and maintenance of time series motifs , 2010, KDD.

[31]  Eamonn J. Keogh,et al.  A disk-aware algorithm for time series motif discovery , 2011, Data Mining and Knowledge Discovery.

[32]  Tele Tan,et al.  Classifying eye and head movement artifacts in EEG signals , 2011, 5th IEEE International Conference on Digital Ecosystems and Technologies (IEEE DEST 2011).

[33]  Deep Bera,et al.  Cardiac arrhythmia detection using dynamic time warping of ECG beats in e-healthcare systems , 2011, 2011 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks.

[34]  Arvind Kumar,et al.  Implementing the dynamic time warping algorithm in multithreaded environments for real time and unsupervised pattern discovery , 2011, 2011 2nd International Conference on Computer and Communication Technology (ICCCT-2011).

[35]  M. Sile O'Modhrain,et al.  Recognition Of Multivariate Temporal Musical Gestures Using N-Dimensional Dynamic Time Warping , 2011, NIME.

[36]  Albert J. Vilella,et al.  Comparative and demographic analysis of orang-utan genomes , 2011, Nature.

[37]  Dimitrios Gunopulos,et al.  Embedding-based subsequence matching in time-series databases , 2011, TODS.

[38]  James R. Glass,et al.  An inner-product lower-bound estimate for dynamic time warping , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

引用
Ecosystem on the Web : Non-Linear Dynamical Systems for Online Social Activities
2017
Knowledge Science, Engineering and Management: 13th International Conference, KSEM 2020, Hangzhou, China, August 28–30, 2020, Proceedings, Part I
KSEM
2020
Effective Sub-Sequence-Based Dynamic Time Warping
SGAI Conf.
2019
Segmenting, Summarizing and Predicting Data Sequences
2018
Comparing temporal graphs using dynamic time warping
Social Network Analysis and Mining
2020
Individualized Modeling to Distinguish Between High and Low Arousal States Using Physiological Data
Journal of Healthcare Informatics Research
2020
A Novel Measure for Trajectory Similarity
ICGEC
2019
Efficient Query Processing in Time Series
SIGMOD PhD Symposium
2015
Behavior Analysis for Electronic Commerce Trading Systems: A Survey
IEEE Access
2019
Mining Linguistic Tone Patterns Using Fundamental Frequency Time-Series Data
2017
Fast trajectory search for real-world applications
2019
Structural health monitoring meets data mining
2014
Design of Management Platform Architecture and Key Algorithm for Massive Monitoring Big Data
Wireless Communications and Mobile Computing
2021
Codebook-based electrooculography data analysis towards cognitive activity recognition
Comput. Biol. Medicine
2018
Cryptomining Cannot Change Its Spots: Detecting Covert Cryptomining Using Magnetic Side-Channel
IEEE Transactions on Information Forensics and Security
2020
Scaling learning algorithms using locality sensitive hashing
2018
Profiling spatial and temporal behaviour in sensor networks: A case study in energy monitoring
2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP)
2014
Enhancing Performance of Magnetic Field Based Indoor Localization Using Magnetic Patterns from Multiple Smartphones
Sensors
2020
On the effect of endpoints on dynamic time warping
2016
Prefix and Suffix Invariant Dynamic Time Warping
2016 IEEE 16th International Conference on Data Mining (ICDM)
2016