Cluster and Calendar Based Visualization of Time Series Data
A new method is presented to get an insight into univariate time series data. The problem addressed is how to identify patterns and trends on multiple time scales (days, weeks, seasons) simultaneously. The solution presented is to cluster similar daily data patterns, and to visualize the average patterns as graphs and the corresponding days on a calendar. This presentation provides a quick insight into both standard and exceptional patterns. Furthermore, it is well suited to interactive exploration. Two applications, numbers of employees present and energy consumption, are presented.
Modeling of multivariate time series using hidden markov models
Vector-valued (or multivariate) time series data commonly occur in various sciences. While modeling univariate time series is well-studied, modeling of multivariate time series, especially finite-valued or categorical, has been relatively unexplored. In this dissertation, we employ hidden Markov models (HMMs) to capture temporal and multivariate dependencies in the multivariate time series data. We modularize the process of building such models by separating the modeling of temporal dependence, multivariate dependence, and non-stationary behavior. We also propose new methods of modeling multivariate dependence for categorical and real-valued data while drawing parallels between these two seemingly different types of data. Since this work is in part motivated by the problem of prediction precipitation over geographic regions from the multiple weather stations, we present in detail models pertinent to this hydrological application and perform a thorough analysis of the models on data collected from a number of different geographic regions.
Anomaly detection for symbolic sequences and time series data
This thesis deals with the problem of anomaly detection for sequence data. Anomaly detection has been a widely researched problem in several application domains such as system health management, intrusion detection, health-care, bio-informatics, fraud detection, and mechanical fault detection. Traditional anomaly detection techniques analyze each data instance (as a univariate or multivariate record) independently, and ignore the sequential aspect of the data. Often, anomalies in sequences can be detected only by analyzing data instances together as a sequence, and hence cannot detected by traditional anomaly detection techniques. The problem of anomaly detection for sequence data is a rich area of research because of two main reasons. First, sequences can be of different types, e.g., symbolic sequences, time series data, etc., and each type of sequence poses unique set of problems. Second, anomalies in sequences can be defined in multiple ways and hence there are different problem formulations. In this thesis we focus on solving one particular problem formulation called semi-supervised anomaly detection. We study the problem separately for symbolic sequences, univariate time series data, and multivariate time series data. The state of art on anomaly detection for sequences is limited and fragmented across application domains. For symbolic sequences, several techniques have been proposed within specific domains, but it is not well-understood as to how a technique developed for one domain would perform in a completely different domain. For univariate time series data, limited techniques exist, and are only evaluated for specific domains, while for multivariate time series data, anomaly detection research is relatively untouched. This thesis has two key goals. First goal is to develop novel anomaly detection techniques for different types of sequences which perform better than existing techniques across a variety of application domains. The second goal is to identify the best anomaly detection technique for a given application domain. By realizing the first goal, we develop a suite of anomaly detection techniques for a domain scientist to choose from, while the second goal will help the scientist to choose the technique best suited for the task. To achieve the first goal, we develop several novel anomaly detection techniques for univariate symbolic sequences, univariate time series data, and multivariate time series data. We provide extensive experimental evaluation of the proposed techniques on data sets collected across diverse domains and generated from data generators, also developed as part of this thesis. We show how the proposed techniques can be used to detect anomalies which translate to critical events in domains such as aircraft safety, intrusion detection, and patient health management. The techniques proposed in this thesis are shown to outperform existing techniques on many data sets. The technique proposed for multivariate time series data is one of the very first anomaly detection technique that can detect complex anomalies in such data. To achieve the second goal, we study the relationship between anomaly detection techniques and the nature of the data on which they are applied. A novel analysis framework, Reference Based Analysis (RBA), is proposed that can map a given data set (of any type) into a multivariate continuous space with respect to a reference data set. We apply the RBA framework to not only visualize and understand complex data types, such as multivariate categorical data and symbolic sequence data, but also to extract data driven features from symbolic sequences, which when used with traditional anomaly detection techniques are shown to consistently outperform the state of art anomaly detection techniques for these complex data types. Two novel techniques for symbolic sequences, WIN1D and WIN 2D are proposed using the RBA framework which perform better than the best technique for each different data set.
neural network sensor network machine learning artificial neural network support vector machine deep learning time series data mining support vector vector machine wavelet transform data analysi deep neural network neural network model hidden markov model regression model deep neural anomaly detection gene expression data base generative adversarial network generative adversarial time series datum adversarial network experimental datum fourier series nearest neighbor support vector regression time series analysi missing datum data based moving average gene expression datum time series model series analysi lyapunov exponent series datum outlier detection dynamic time warping time series forecasting data mining algorithm panel datum time series prediction series model multivariate time series finite time unit root dynamic time linear and nonlinear series forecasting time warping distance measure financial time series series prediction integrated moving average experimental comparison multivariate time financial time dependent variable chaotic time series nonlinear time vegetation index nonlinear time series arima model fuzzy time large time anomaly detection method fuzzy time series chaotic time autoregressive integrated moving time series based air pollutant time series classification representation method fokker-planck equation series representation similarity analysi series classification univariate time series time series clustering unsupervised anomaly detection periodic pattern nearest neighbor classification time series dataset series data mining time series regression anomaly detection approach time series database series clustering observed time series forecasting time series local similarity long time series time series similarity series database fmri time series complex time indian stock market time series representation symbolic aggregate approximation complex time series forecasting time series data set series similarity fmri time time series anomaly large time series series data analysi series anomaly detection analyzing time series expression time series interrupted time series ucr time series time correction modeling time series clustering time series mining time series interrupted time series data based fourier series representation simple exponential smoothing early classification forecast time series time series subsequence sensor networks pose distributed index piecewise constant approximation quality time series mining time microarray time series incomplete time series massive time series large-scale time series analysing time series microarray time neural time series mri time neural time series data generated time series experiment visualizing time series called time series data set