Discovering unusual and non-trivial patterns in massive time series databases

Time series is perhaps the most commonly encountered data type, touching almost every aspect of human life, including medicine (ECG, EEG data), finance (stock market data, credit card usage data), aerospace (launch telemetry, satellite sensor data), entertainment (music, movies), etc. Apart from the obvious problem of handling the typically massive size of time series databases (gigabytes, or even terabytes in size is not uncommon), most classic machine learning and data mining algorithms do not work well for time series due to their unique structure. In particular, the high dimensionality, very high feature correlation, and the (typically) large amount of noise that characterize time series data present a difficult challenge. The emphasis of this work is on the discovery of important patterns in time series data. The previous body of work in this area has been mostly concentrated on the identification of previously known patterns. The major distinction of this work is that it offers the ability to discover important, unknown patterns in an effective and automated manner. These significant patterns can manifest themselves as either frequently repeated patterns, which we formally define as “time series motifs,” or anomalous patterns. A novel symbolic representation for time series data is introduced, based on which promising solutions for motif discovery, anomaly detection, and visualization are proposed. Furthermore, the topic of time series clustering is also covered, which includes a discussion on the validity of subsequence clustering, and the introduction of an efficient multi-resolution anytime algorithm for whole clustering.