Extraction of Interpretable Multivariate Patterns for Early Diagnostics

Leveraging temporal observations to predict a patient's health state at a future period is a very challenging task. Providing such a prediction early and accurately allows for designing a more successful treatment that starts before a disease completely develops. Information for this kind of early diagnosis could be extracted by use of temporal data mining methods for handling complex multivariate time series. However, physicians usually prefer to use interpretable models that can be easily explained, rather than relying on more complex black-box approaches. In this study, a temporal data mining method is proposed for extracting interpretable patterns from multivariate time series data, which can be used to assist in providing interpretable early diagnosis. The problem is formulated as an optimization based binary classification task addressed in three steps. First, the time series data is transformed into a binary matrix representation suitable for application of classification methods. Second, a novel convex-concave optimization problem is defined to extract multivariate patterns from the constructed binary matrix. Then, a mixed integer discrete optimization formulation is provided to reduce the dimensionality and extract interpretable multivariate patterns. Finally, those interpretable multivariate patterns are used for early classification in challenging clinical applications. In the conducted experiments on two human viral infection datasets and a larger myocardial infarction dataset, the proposed method was more accurate and provided classifications earlier than three alternative state-of-the-art methods.

[1]  Dan Roth,et al.  Efficient Pattern-Based Time Series Classification on GPU , 2012, 2012 IEEE 12th International Conference on Data Mining.

[2]  Ashok N. Srivastava,et al.  Advances in Machine Learning and Data Mining for Astronomy , 2012 .

[3]  Mohamed F. Ghalwash,et al.  Early classification of multivariate temporal observations by extraction of interpretable shapelets , 2012, BMC Bioinformatics.

[4]  Jason Lines,et al.  Transformation Based Ensembles for Time Series Classification , 2012, SDM.

[5]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[6]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[7]  Philip S. Yu,et al.  Extracting Interpretable Features for Early Classification on Time Series , 2011, SDM.

[8]  Rudolf Kruse,et al.  Obtaining interpretable fuzzy classification rules from medical data , 1999, Artif. Intell. Medicine.

[9]  R. Bellazzi,et al.  Methods and tools for mining multivariate temporal data in clinical and biomedical applications , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[10]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[11]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[12]  Mohamed F. Ghalwash,et al.  Patient-specific early classification of multivariate observations , 2015, Int. J. Data Min. Bioinform..

[13]  Jessica Lin,et al.  Pattern Recognition in Time Series , 2012 .

[14]  Mohamed F. Ghalwash,et al.  Early Diagnosis and Its Benefits in Sepsis Blood Purification Treatment , 2013, 2013 IEEE International Conference on Healthcare Informatics.

[15]  Cynthia Rudin,et al.  ORC: Ordered Rules for ClassificationA Discrete Optimization Approach to Associative Classification , 2012 .

[16]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[17]  Slobodan Vucetic,et al.  Convex Kernelized Sorting , 2012, AAAI.

[18]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[19]  Wen-Chung Lee,et al.  Detecting differentially expressed genes in heterogeneous diseases using control-only analysis of variance. , 2012, Annals of epidemiology.

[20]  L. Carin,et al.  Gene expression signatures diagnose influenza and other symptomatic respiratory viral infections in humans. , 2009, Cell host & microbe.

[21]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[22]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[23]  Ralf Bousseljot,et al.  Nutzung der EKG-Signaldatenbank CARDIODAT der PTB über das Internet , 2009 .

[24]  Jason Lines,et al.  A shapelet transform for time series classification , 2012, KDD.

[25]  Thomas F. Coleman,et al.  An Interior Trust Region Approach for Nonlinear Minimization Subject to Bounds , 1993, SIAM J. Optim..

[26]  Alan L. Yuille,et al.  The Concave-Convex Procedure (CCCP) , 2001, NIPS.

[27]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[28]  Eamonn J. Keogh,et al.  Logical-shapelets: an expressive primitive for time series classification , 2011, KDD.

[29]  Mohamed F. Ghalwash,et al.  Early classification of multivariate time series using a hybrid HMM/SVM model , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[30]  Philip S. Yu,et al.  Early prediction on time series: a nearest neighbor approach , 2009, IJCAI 2009.