A general framework for time series data mining based on event analysis: Application to the medical domains of electroencephalography and stabilometry

There are now domains where information is recorded over a period of time, leading to sequences of data known as time series. In many domains, like medicine, time series analysis requires to focus on certain regions of interest, known as events, rather than analyzing the whole time series. In this paper, we propose a framework for knowledge discovery in both one-dimensional and multidimensional time series containing events. We show how our approach can be used to classify medical time series by means of a process that identifies events in time series, generates time series reference models of representative events and compares two time series by analyzing the events they have in common. We have applied our framework on time series generated in the areas of electroencephalography (EEG) and stabilometry. Framework performance was evaluated in terms of classification accuracy, and the results confirmed that the proposed schema has potential for classifying EEG and stabilometric signals. The proposed framework is useful for discovering knowledge from medical time series containing events, such as stabilometric and electroencephalographic time series. These results would be equally applicable to other medical domains generating iconographic time series, such as, for example, electrocardiography (ECG).

[1]  Richard J. Povinelli,et al.  Time series data mining: identifying temporal patterns for characterization and prediction of time series events , 1999 .

[2]  C.A. Pena-Reyes,et al.  Designing breast cancer diagnostic systems via a hybrid fuzzy-genetic methodology , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[3]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  Chuen-Tsai Sun,et al.  Neuro-fuzzy modeling and control , 1995, Proc. IEEE.

[5]  Juan Pedro Caraça-Valente,et al.  Discovering similar patterns in time series , 2000, KDD '00.

[6]  Sorina Zahan,et al.  A fuzzy approach to computer-assisted myocardial ischemia diagnosis , 2001, Artif. Intell. Medicine.

[7]  Constantinos S. Pattichis,et al.  Neural network models in EMG diagnosis , 1995 .

[8]  Juan Alfonso Lara,et al.  Comparing Time Series through Event Clustering , 2008, IWPACBB.

[9]  Fabian Mörchen,et al.  Mining Hierarchical Temporal Patterns in Multivariate Time Series , 2004, KI.

[10]  Paulo J. Azevedo,et al.  Protein Sequence Classification Through Relevant Sequence Mining and Bayes Classifiers , 2005, EPIA.

[11]  Madan M. Gupta,et al.  Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems , 2003 .

[12]  Wahidah Husain,et al.  Data Mining for Medical Systems: A Review , 2012, CIT 2012.

[13]  Jianxin Chen,et al.  A Comparison of Four Data Mining Models: Bayes, Neural Network, SVM and Decision Trees in Identifying Syndromes in Coronary Heart Disease , 2007, ISNN.

[14]  Senén Barro,et al.  TRACE, a graphical tool for the acquisition and detection of signal patterns , 2009, Expert Syst. Appl..

[15]  Qiujun Lan,et al.  A Method of Discovering Patterns to Predict Specified Events from Financial Time Series , 2008, 2008 Fourth International Conference on Natural Computation.

[16]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[17]  Xin Feng,et al.  Pattern Identification Using Reconstructed Phase Space and Hidden Markov Model , 2012, 2012 11th International Conference on Machine Learning and Applications.

[18]  Deok-Hwan Kim,et al.  Similarity search for multidimensional data sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[19]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[20]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[21]  Philip K. Chan,et al.  Modeling multiple time series for anomaly detection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[22]  M S Magnusson,et al.  Discovering hidden time patterns in behavior: T-patterns and their detection , 2000, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[23]  Milos Hauskrecht,et al.  Multivariate Time Series Classification with Temporal Abstractions , 2009, FLAIRS.

[24]  Jian Yin,et al.  A Clustering Algorithm for Time Series Data , 2006, 2006 Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06).

[25]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[26]  Georgios N. Yannakakis,et al.  Mining multimodal sequential patterns: a case study on affect detection , 2011, ICMI '11.

[27]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[28]  Ashraf Fahmy,et al.  Neuro-fuzzy modelling and control of robotic manipulators , 2005 .

[29]  Juan Alfonso Lara,et al.  Two Different Approaches of Feature Extraction for Classifying the EEG Signals , 2011, EANN/AIAI.

[30]  Zhao Yun-feng,et al.  A New Model for Multiple Time Series Based on Data Mining , 2008, 2008 International Symposium on Knowledge Acquisition and Modeling.

[31]  P. Nagabhushan,et al.  WaveSim and Adaptive WaveSim Transform for Subsequence Time-Series Clustering , 2006, 9th International Conference on Information Technology (ICIT'06).

[32]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[33]  Ya-Ju Fan,et al.  On the Time Series $K$-Nearest Neighbor Classification of Abnormal Brain Activity , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[34]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[35]  Norman D. Black,et al.  Evaluation of Outcome Prediction for a Clinical Diabetes Database , 2004, KELSI.

[36]  Alan Liu,et al.  Pattern discovery of fuzzy time series for financial prediction , 2006, IEEE Transactions on Knowledge and Data Engineering.

[37]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[38]  Aurora Pérez,et al.  Adaptive Fuzzy Inference Neural Network System for EEG and Stabilometry Signals Classification , 2011 .

[39]  M A Musen,et al.  Knowledge acquisition for temporal abstraction. , 1996, Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium.

[40]  Wesley W. Chu,et al.  Efficient searches for similar subsequences of different lengths in sequence databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[41]  Eamonn J. Keogh,et al.  Classification of Multi-dimensional Streaming Time Series by Weighting Each Classifier's Track Record , 2013, 2013 IEEE 13th International Conference on Data Mining.

[42]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[43]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[44]  Dennis J. Sweeney,et al.  Quantitative Methods for Business , 1983 .

[45]  K Lehnertz,et al.  Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: dependence on recording region and brain state. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Peng Yang,et al.  A Spectral Clustering Algorithm for Outlier Detection , 2008, 2008 International Seminar on Future Information Technology and Management Engineering.

[47]  C. K. Mohan,et al.  ClaDia: a fuzzy classifier system for disease diagnosis , 2000, Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512).

[48]  Jianyong Wang,et al.  Mining Complex Time-Series Data by Learning Markovian Models , 2006, Sixth International Conference on Data Mining (ICDM'06).

[49]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[50]  C. Granger Investigating Causal Relations by Econometric Models and Cross-Spectral Methods , 1969 .

[51]  Tzung-Pei Hong,et al.  Segmentation of Time Series by the Clustering and Genetic Algorithms , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[52]  Richard J. Povinelli,et al.  Identifying Temporal Patterns for Characterization and Prediction of Financial Time Series Events , 2000, TSDM.

[53]  Mathieu S. Capcarrère,et al.  Necessary conditions for density classification by cellular automata. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[54]  Hong Qiao,et al.  Comparing data mining methods with logistic regression in childhood obesity prediction , 2009, Inf. Syst. Frontiers.

[55]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[56]  George J. Klir,et al.  Fuzzy Sets, Fuzzy Logic, and Fuzzy Systems - Selected Papers by Lotfi A Zadeh , 1996, Advances in Fuzzy Systems - Applications and Theory.

[57]  Vassilis S. Kodogiannis,et al.  Classification of Stabilometric Time-Series Using an Adaptive Fuzzy Inference Neural Network System , 2010, ICAISC.

[58]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[59]  Jinghong Li,et al.  Sleep stage study with wavelet time-frequency analysis , 2005, 2005 International Conference on Neural Networks and Brain.

[60]  M. Sugeno,et al.  Structure identification of fuzzy model , 1988 .

[61]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[62]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[63]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[64]  Philip S. Yu,et al.  Discovering shakers from evolving entities via cascading graph inference , 2011, KDD.

[65]  Carolyn McGregor,et al.  Multi-dimensional temporal abstraction and data mining of medical time series data: Trends and challenges , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[66]  Qiang Wang,et al.  Partial elastic matching of time series , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[67]  Tak-Chung Fu,et al.  An evolutionary approach to pattern-based time series segmentation , 2004, IEEE Transactions on Evolutionary Computation.

[68]  George Scott,et al.  Strategic Planning for High-Tech Product Development , 2001, Technol. Anal. Strateg. Manag..

[69]  B Kovalerchuk,et al.  Consistent knowledge discovery in medical diagnosis. , 2000, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[70]  Hilbert J. Kappen,et al.  Approximate inference for medical diagnosis , 1999, Pattern Recognit. Lett..

[71]  Juan Alfonso Lara,et al.  Sensor-Generated Time Series Events: A Definition Language , 2012, Sensors.

[72]  W. Baxt Application of artificial neural networks to clinical medicine , 1995, The Lancet.

[73]  Erik Strumbelj,et al.  Explanation and reliability of prediction models: the case of breast cancer recurrence , 2010, Knowledge and Information Systems.

[74]  Zheng-ou Wang,et al.  Research on Shape-Based Time Series Similarity Measure , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[75]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[76]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[77]  Xin Feng,et al.  Detecting temporal patterns using Reconstructed Phase Space and Support Vector Machine in the dynamic data system , 2011, 2011 IEEE International Conference on Information and Automation.

[78]  Carolyn McGregor,et al.  Temporal abstraction in intelligent clinical data analysis: A survey , 2007, Artif. Intell. Medicine.