Time series novelty detection with application to production sensor systems

Modern fiber manufacturing plants rely heavily on the use of automation. Automated facilities use sensors to measure fiber state and react to data patterns, which correspond to physical events. Many patterns can be predefined either by careful analysis or by domain experts. Instances of these patterns can then be discovered through techniques such as pattern recognition. However, pattern recognition will fail to detect events that have not been predefined, potentially causing expensive production errors. A solution to this dilemma, novelty detection, allows for the identification of interesting data patterns embedded in otherwise normal data. In this thesis we investigate some of the aspects of implementing novelty detection in a fiber manufacturing system. Specifically, we empirically evaluate the effectiveness of currently available feature extraction and novelty detection techniques on data from a real fiber manufacturing system. -- Our results show that piecewise linear approximation (PLA) methods produce the highest quality features for fiber property datasets. Motivated by this fact, we introduced a new PLA algorithm called improved bottom up segmentation (IBUS). This new algorithm produced the highest quality features and considerably more data reduction than all currently available feature extraction techniques for our application. -- Further empirical results from several leading time series novelty detection techniques revealed two conclusions. A simple Euclidean distance based technique is the best overall when no feature extraction is used. However, when feature extraction is used the Tarzan technique performs best.

[1]  Cyrus Shahabi,et al.  TSA-tree: a wavelet-based approach to improve the efficiency of multi-level surprise and trend queries on time-series data , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[2]  Xiaoli Li Fuzzy Neural Network and Wavelet for Tool Condition Monitoring , 2000 .

[3]  Dennis Peters,et al.  An improved feature extraction technique for high volume time series data , 2006 .

[4]  David A. Patterson,et al.  Combining statistical monitoring and predictable recovery for self-management , 2004, WOSS '04.

[5]  Philip K. Chan,et al.  Trajectory boundary modeling of time series for anomaly detection , 2005 .

[6]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[7]  J. L. Dorrity,et al.  REAL-TIME FABRIC DEFECT DETECTION AND CONTROL IN WEAVING PROCESSES Project No . G 94-2 Principal Investigators : , 1997 .

[8]  Alice M. Agogino,et al.  Comparing a Neural-Fuzzy Scheme with a Probabilistic Neural Network for Applications to Monitoring and Diagnostics in Manufacturing Systems , 1994 .

[9]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[10]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[11]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[12]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[13]  Gregor Hohpe,et al.  Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions , 2003 .

[14]  Claudia Eckert,et al.  On the appropriateness of negative selection defined over Hamming shape-space as a network intrusion detection system , 2005, 2005 IEEE Congress on Evolutionary Computation.

[15]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[16]  Amara Lynn Graps,et al.  An introduction to wavelets , 1995 .

[17]  Zhou Ji,et al.  Artificial immune system (AIS) research in the last five years , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[18]  Shenghuo Zhu,et al.  A survey on wavelet applications in data mining , 2002, SKDD.

[19]  S. Muthukrishnan,et al.  Mining Deviants in a Time Series Database , 1999, VLDB.

[20]  S. Mallat A wavelet tour of signal processing , 1998 .

[21]  Divyakant Agrawal,et al.  A comparison of DFT and DWT based similarity search in time-series databases , 2000, CIKM '00.

[22]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.

[23]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[24]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[25]  Jessica Lin,et al.  Visually mining and monitoring massive time series , 2004, KDD.

[26]  Dipankar Dasgupta,et al.  Novelty detection in time series data using ideas from immunology , 1996 .

[27]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[28]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[29]  Jim Hunter,et al.  Knowledge-Based Event Detection in Complex Time Series Data , 1999, AIMDM.

[30]  Junshui Ma,et al.  Online novelty detection on temporal sequences , 2003, KDD '03.

[31]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[32]  Philip Chan,et al.  Learning States and Rules for Time Series Anomaly Detection , 2004, FLAIRS.

[33]  F. Mörchen Time series feature extraction for data mining using DWT and DFT , 2003 .

[34]  Ambuj K. Singh,et al.  Efficient retrieval for browsing large image databases , 1996, CIKM '96.

[35]  Christopher S. Lynnes,et al.  Automated Data Quality Assessment in the Intelligent Archive , 2003 .

[36]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[37]  J. Anstey,et al.  Discovering Novelty in Time Series Data , 2005 .

[38]  Wai Sum Tang,et al.  Applications in Intelligent Manufacturing: An Updated Survey , 2000 .

[39]  Armando Fox,et al.  Cheap recovery: a key to self-managing state , 2004, TOS.

[40]  Philip K. Chan,et al.  Modeling multiple time series for anomaly detection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[41]  Wesley W. Chu,et al.  Segment-based approach for subsequence searches in sequence databases , 2001, Comput. Syst. Sci. Eng..

[42]  Silvio Romero de Lemos Meira,et al.  Combining MLP and RBF Neural Networks for Novelty Detection in Short Time Series , 2004, MICAI.

[43]  Dennis Shasha,et al.  High Performance Data Mining in Time Series: Techniques and Case Studies , 2004 .

[44]  Renée J. Miller,et al.  Similarity search over time-series data using wavelets , 2002, Proceedings 18th International Conference on Data Engineering.

[45]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[46]  David D. Jensen,et al.  Mining of Concurrent Text and Time Series , 2008 .