Classification-driven temporal discretization of multivariate time series

Biomedical data, in particular electronic medical records data, include a large number of variables sampled in irregular fashion, often including both time point and time intervals, thus providing several challenges for analysis and data mining. Classification of multivariate time series data is a challenging task, but is often necessary for medical care or research. Increasingly, temporal abstraction, in which a series of raw-data time points is abstracted into a set of symbolic time intervals, is being used for classification of multivariate time series. In this paper, we introduce a novel supervised discretization method, geared towards enhancement of classification accuracy, which determines the cutoffs that will best discriminate among classes through the distribution of their states. We present a framework for classification of multivariate time series analysis, which implements three phases: (1) application of a temporal-abstraction process that transforms a series of raw time-stamped data points into a series of symbolic time intervals (based on either unsupervised or supervised temporal abstraction); (2) mining these time intervals to discover frequent temporal-interval relation patterns (TIRPs), using versions of Allen’s 13 temporal relations; (3) using the patterns as features to induce a classifier. We evaluated the framework, focusing on the comparison of three versions of the new, supervised, temporal discretization for classification (TD4C) method, each relying on a different symbolic-state distribution-distance measure among outcome classes, to several commonly used unsupervised methods, on real datasets in the domains of diabetes, intensive care, and infectious hepatitis. Using only three abstract temporal relations resulted in a better classification performance than using Allen’s seven relations, especially when using three symbolic states per variable. Similarly when using the horizontal support and mean duration as the TIRPs feature representation, rather than a binary (existence) representation. The classification performance when using the three versions of TD4C was superior to the performance when using the unsupervised (EWD, SAX, and KB) discretization methods.

[1]  Ada Wai-Chee Fu,et al.  Discovering Temporal Patterns for Interval-Based Events , 2000, DaWaK.

[2]  Suh-Yin Lee,et al.  CEMiner -- An Efficient Algorithm for Mining Closed Patterns from Time Interval-Based Data , 2011, 2011 IEEE 11th International Conference on Data Mining.

[3]  Yuval Shahar,et al.  A Framework for Knowledge-Based Temporal Abstraction , 1997, Artif. Intell..

[4]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[5]  Yuval Shahar,et al.  Vaidurya - A Concept-Based, Context-Sensitive Search Engine For Clinical Guidelines , 2004, MedInfo.

[6]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Milos Hauskrecht,et al.  Mining recent temporal patterns for event detection in multivariate time series data , 2012, KDD.

[8]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[9]  John F. Roddick,et al.  A Survey of Temporal Knowledge Discovery Paradigms and Methods , 2002, IEEE Trans. Knowl. Data Eng..

[10]  Yen-Liang Chen,et al.  Mining Nonambiguous Temporal Patterns for Interval-Based Events , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[12]  Ron Kohavi,et al.  Error-Based and Entropy-Based Discretization of Continuous Features , 1996, KDD.

[13]  Diane J. Cook,et al.  Detecting Anomalous Sensor Events in Smart Home Data for Enhancing the Living Experience , 2011, Artificial Intelligence and Smarter Living.

[14]  John F. Roddick,et al.  ARMADA - An algorithm for discovering richer relative temporal association rules from interval-based data , 2007, Data Knowl. Eng..

[15]  Christian Freksa,et al.  Temporal Reasoning Based on Semi-Intervals , 1992, Artif. Intell..

[16]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[17]  Carlo Combi,et al.  Data mining with Temporal Abstractions: learning rules from time series , 2007, Data Mining and Knowledge Discovery.

[18]  Eamonn J. Keogh,et al.  Time Series Classification under More Realistic Assumptions , 2013, SDM.

[19]  Yuval Shahar,et al.  Improving Worm Detection with Artificial Neural Networks through Feature Selection and Temporal Analysis Techniques , 2008 .

[20]  F. Höppner Learning Temporal Rules from State Sequences , 2001 .

[21]  Yuval Shahar,et al.  Semiautomated Acquisition of Clinical Temporal-abstraction Knowledge , 1998 .

[22]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[23]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[24]  Yuval Shahar,et al.  Classification of multivariate time series via temporal abstraction and time intervals mining , 2015, Knowledge and Information Systems.

[25]  Suh-Yin Lee,et al.  An efficient algorithm for mining time interval-based patterns in large database , 2010, CIKM.

[26]  Yuval Shahar,et al.  Application of Artificial Neural Networks Techniques to Computer Worm Detection , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[27]  Milos Hauskrecht,et al.  A temporal pattern mining approach for classifying electronic health record data , 2013, ACM Trans. Intell. Syst. Technol..

[28]  Evert de Jonge,et al.  Temporal abstraction for feature extraction: A comparative case study in prediction from intensive care monitoring data , 2007, Artif. Intell. Medicine.

[29]  Eamonn J. Keogh,et al.  Three Myths about Dynamic Time Warping Data Mining , 2005, SDM.

[30]  Fabian Mörchen,et al.  Optimizing time series discretization for knowledge discovery , 2005, KDD '05.

[31]  Lior Rokach,et al.  Detection of unknown computer worms based on behavioral classification of the host , 2008, Comput. Stat. Data Anal..

[32]  A Ziegler,et al.  Data Analysis and Data Mining: Current Issues in Biomedical Informatics , 2011, Methods of Information in Medicine.

[33]  Frank Höppner,et al.  Time Series Abstraction Methods - A Survey , 2002, GI Jahrestagung.

[34]  Yuval Shahar,et al.  Detection of Unknown Computer Worms Activity Based on Computer Behavior using Data Mining , 2007, CISDA.

[35]  Mong-Li Lee,et al.  Mining relationships among interval-based events for classification , 2008, SIGMOD Conference.

[36]  Yuval Shahar Knowledge-based temporal interpolation , 1999, J. Exp. Theor. Artif. Intell..

[37]  Dimitrios Gunopulos,et al.  Mining frequent arrangements of temporal intervals , 2009, Knowledge and Information Systems.

[38]  Kien A. Hua,et al.  Knowledge Discovery from Series of Interval Events , 2000, Journal of Intelligent Information Systems.

[39]  Yuval Shahar,et al.  Medical Temporal-Knowledge Discovery via Temporal Abstraction , 2009, AMIA.

[40]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[41]  Waldemar Rebizant,et al.  Application of Artificial Neural Networks , 2011 .

[42]  Michael R. Berthold,et al.  Pattern Graphs: Combining Multivariate Time Series and Labelled Interval Sequences for Classification , 2013, SGAI Conf..

[43]  Yuval Shahar,et al.  Fast time intervals mining using the transitivity of temporal relations , 2013, Knowledge and Information Systems.

[44]  Evert de Jonge,et al.  Temporal Discretization of medical time series - A comparative study , 2007 .

[45]  Gilles Clermont,et al.  Data-driven identification of unusual clinical actions in the ICU , 2013, AMIA.

[46]  Yuval Shahar,et al.  Original Investigation: Semi-automated Entry of Clinical Temporal-abstraction Knowledge , 1999, J. Am. Medical Informatics Assoc..

[47]  Fabian Mörchen,et al.  Algorithms for time series knowledge mining , 2006, KDD '06.

[48]  Sebastian Peter,et al.  Temporal interval pattern languages to characterize time flow , 2014, WIREs Data Mining Knowl. Discov..