Multivariate sequential contrast pattern mining and prediction models for critical care clinical informatics

Data mining and knowledge discovery involves efficient search and discovery of patterns in data that are able to describe the underlying complex structure and properties of the corresponding system. To be of practical use, the discovered patterns need to be novel, informative and interpretable. Large-scale unstructured biomedical databases such as electronic health records (EHRs) tend to exacerbate the problem of discovering interesting and useful patterns. Typically, patients in intensive care units (ICUs) require constant monitoring of vital signs. To this purpose, signicant quantities of patient data, coupled with waveform signals are gathered from biosensors and clinical information systems. Subsequently, clinicians face an enormous challenge in the assimilation and interpretation of large volumes of unstructured, multidimensional, noisy and dynamically fluctuating patient data. The availability of de-identified ICU datasets like the MIMIC-II (Multiparameter Intelligent Monitoring in Intensive Care) databases provide an opportunity to advance medical care, by benchmarking algorithms that capture subtle patterns associated with specific medical conditions. Such patterns are able to provide fresh insights into disease dynamics over long time scales. In this research, we focus on the extraction of computational physiological markers, in the form of relevant medical episodes, event sequences and distinguishing sequential patterns. These interesting patterns known as sequential contrast patterns are combined with patient clinical features to develop powerful clinical prediction models. Later, the clinical models are

[1]  M. Saeed Multiparameter Intelligent Monitoring in Intensive Care II ( MIMIC-II ) : A public-access intensive care unit database , 2011 .

[2]  Jean-Marie Aerts,et al.  Computerized prediction of intensive care unit discharge after cardiac surgery: development and validation of a Gaussian processes model , 2011, BMC Medical Informatics Decis. Mak..

[3]  Kotagiri Ramamohanarao,et al.  Information-Based Classification by Aggregating Emerging Patterns , 2000, IDEAL.

[4]  Krzysztof Walczak,et al.  Jumping Emerging Patterns with Occurrence Count in Image Classification , 2008, PAKDD.

[5]  Yuval Shahar,et al.  Classification-driven temporal discretization of multivariate time series , 2014, Data Mining and Knowledge Discovery.

[6]  Shin-ichi Minato,et al.  Zero-Suppressed BDDs for Set Manipulation in Combinatorial Problems , 1993, 30th ACM/IEEE Design Automation Conference.

[7]  S. Nizami,et al.  Implementation of Artifact Detection in Critical Care: A Methodological Review , 2013, IEEE Reviews in Biomedical Engineering.

[8]  Ying Zhang,et al.  Patient-specific learning in real time for adaptive monitoring in critical care , 2008, J. Biomed. Informatics.

[9]  A. Reisner,et al.  Clinician blood pressure documentation of stable intensive care patients: An intelligent archiving agent has a higher association with future hypotension , 2011, Critical care medicine.

[10]  Kotagiri Ramamohanarao,et al.  Patterns Based Classifiers , 2007, World Wide Web.

[11]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[12]  Yuanxi Li,et al.  Modelling and analysing the dynamics of disease progression from cross-sectional studies , 2013, J. Biomed. Informatics.

[13]  Jian Pei,et al.  Constraint-based sequential pattern mining: the pattern-growth methods , 2007, Journal of Intelligent Information Systems.

[14]  James Bailey,et al.  Using Highly Expressive Contrast Patterns for Classification - Is It Worthwhile? , 2009, PAKDD.

[15]  Frank Höppner,et al.  Knowledge discovery from sequential data , 2003 .

[16]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[17]  Paulo Carvalho,et al.  Prediction of acute hypotensive episodes by means of neural network multi-models , 2011, Comput. Biol. Medicine.

[18]  David T. Huang,et al.  Early goal-directed therapy. , 2004, Critical care medicine.

[19]  Caleb W. Hug,et al.  Detecting hazardous intensive care patient episodes using real-time mortality models , 2009 .

[20]  Pawel Terlecki,et al.  Efficient Discovery of Top-K Minimal Jumping Emerging Patterns , 2008, RSCTC.

[21]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[22]  Jing Li,et al.  Modeling and analysis of disease and risk factors through learning Bayesian networks from observational data , 2008, Qual. Reliab. Eng. Int..

[23]  James Bailey,et al.  Mining minimal distinguishing subsequence patterns with gap constraints , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[24]  Cheng H. Lee,et al.  Imputation-Enhanced Prediction of Septic Shock in ICU Patients , 2012 .

[25]  Shamim Nemati,et al.  Tracking progression of patient state of health in critical care using inferred shared dynamics in physiological time series , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[26]  Willi Klösgen,et al.  Spatial Subgroup Mining Integrated in an Object-Relational Spatial Database , 2002, PKDD.

[27]  Johannes Fürnkranz,et al.  From Local Patterns to Global Models: The LeGo Approach to Data Mining , 2008 .

[28]  Zhenglu Yang,et al.  LAPIN: Effective Sequential Pattern Mining Algorithms by Last Position Induction for Dense Databases , 2007, DASFAA.

[29]  B. Walker I. Introduction , 2020 .

[30]  Shraddha Savaliya,et al.  An Effective Hash-Based Algorithm for Mining Association Rules , 2015 .

[31]  CM Torio,et al.  National Inpatient Hospital Costs: The Most Expensive Conditions by Payer, 2013: Statistical Brief #204 , 2006 .

[32]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[33]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[34]  Riccardo Bellazzi,et al.  Learning Rules with Complex Temporal Patterns in Biomedical Domains , 2005, AIME.

[35]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[36]  M. Lemay,et al.  Computers in Cardiology / Physionet Challenge 2009: Predicting acute hypotensive episodes , 2009, 2009 36th Annual Computers in Cardiology Conference (CinC).

[37]  Joydeep Ghosh,et al.  Septic Shock Prediction for Patients with Missing Data , 2014, TMIS.

[38]  David A. Clifton,et al.  Probabilistic detection of vital sign abnormality with Gaussian process regression , 2012, 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE).

[39]  S. Roberts,et al.  Estimation of coupled hidden Markov models with application to biosignal interaction modelling , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[40]  G. Moody,et al.  Predicting in-hospital mortality of ICU patients: The PhysioNet/Computing in cardiology challenge 2012 , 2012, 2012 Computing in Cardiology.

[41]  Zhou Wang,et al.  Exploiting Maximal Emerging Patterns for Classification , 2004, Australian Conference on Artificial Intelligence.

[42]  Piotr Indyk,et al.  Motif discovery in physiological datasets: A methodology for inferring predictive elements , 2010, TKDD.

[43]  Dimitrios Gunopulos,et al.  Discovering frequent arrangements of temporal intervals , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[44]  Jukka Takala,et al.  Should we target blood pressure in sepsis? , 2010, Critical care medicine.

[45]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[46]  Zhenglu Yang,et al.  PAID: Mining Sequential Patterns by Passed Item Deduction in Large Databases , 2006, 2006 10th International Database Engineering and Applications Symposium (IDEAS'06).

[47]  Yuval Shahar,et al.  Medical Temporal-Knowledge Discovery via Temporal Abstraction , 2009, AMIA.

[48]  Laurent Brisson,et al.  Mining Gene Expression Data using Domain Knowledge , 2008, Int. J. Softw. Informatics.

[49]  Carolyn McGregor,et al.  Late onset neonatal sepsis detection in newborn infants via multiple physiological streams , 2013 .

[50]  Pawel Terlecki,et al.  Local Projection in Jumping Emerging Patterns Discovery in Transaction Databases , 2008, PAKDD.

[51]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[52]  J. Vincent,et al.  Clinical review: Scoring systems in the critically ill , 2010, Critical care.

[53]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[54]  Nizar R. Mabroukeh,et al.  A taxonomy of sequential pattern mining algorithms , 2010, CSUR.

[55]  Jure Leskovec,et al.  Finding progression stages in time-evolving event sequences , 2014, WWW.

[56]  R. Mark,et al.  An investigation of patterns in hemodynamic data indicative of impending hypotension in intensive care , 2010, Biomedical engineering online.

[57]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[58]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[59]  Riccardo Bellazzi,et al.  Analyzing complex patients' temporal histories: new frontiers in temporal data mining. , 2015, Methods in molecular biology.

[60]  JH Henriques,et al.  Prediction of acute hypotensive episodes using neural network multi-models , 2009, 2009 36th Annual Computers in Cardiology Conference (CinC).

[61]  Rinaldo Bellomo,et al.  Development and implementation of a high-quality clinical database: the Australian and New Zealand Intensive Care Society Adult Patient Database. , 2006, Journal of critical care.

[62]  B. Gersh,et al.  Population Trends in the Incidence and Outcomes of Acute Myocardial Infarction , 2011 .

[63]  Fei Wang,et al.  A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  Susan C. Weber,et al.  STRIDE - An Integrated Standards-Based Translational Research Informatics Platform , 2009, AMIA.

[65]  John F. Roddick,et al.  ARMADA - An algorithm for discovering richer relative temporal association rules from interval-based data , 2007, Data Knowl. Eng..

[66]  Dewang Shavdia,et al.  Septic shock : providing early warnings through multivariate logistic regression models , 2007 .

[67]  G. Moody,et al.  Predicting acute hypotensive episodes: The 10th annual PhysioNet/Computers in Cardiology Challenge , 2010, 2009 36th Annual Computers in Cardiology Conference (CinC).

[68]  Joydeep Ghosh,et al.  HMMs and Coupled HMMs for multi-channel EEG classification , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[69]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[70]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[71]  Johan Decruyenaere,et al.  A novel approach for prediction of tacrolimus blood concentration in liver transplantation patients in the intensive care unit through support vector regression , 2007, Critical care.

[72]  Kai Zhao,et al.  Evaluating association rules and decision trees to predict multiple target attributes , 2011, Intell. Data Anal..

[73]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[74]  A Murray,et al.  Predicting acute hypotensive episodes from mean arterial pressure , 2009, 2009 36th Annual Computers in Cardiology Conference (CinC).

[75]  P. Pronovost,et al.  A targeted real-time early warning score (TREWScore) for septic shock , 2015, Science Translational Medicine.

[76]  M. J. Pearce,et al.  Presymptomatic Prediction of Sepsis in Intensive Care Unit Patients , 2008, Clinical and Vaccine Immunology.

[77]  Cyndya Shibao,et al.  ASH Position Paper: Evaluation and Treatment of Orthostatic Hypotension , 2013, Journal of clinical hypertension.

[78]  Tudor Toma,et al.  Learning predictive models that use pattern discovery - A bootstrap evaluative approach applied in organ functioning sequences , 2010, J. Biomed. Informatics.

[79]  Fabian Mörchen,et al.  Algorithms for time series knowledge mining , 2006, KDD '06.

[80]  Sebastian Peter,et al.  Temporal interval pattern languages to characterize time flow , 2014, WIREs Data Mining Knowl. Discov..

[81]  Evert de Jonge,et al.  Using hierarchical dynamic Bayesian networks to investigate dynamics of organ failure in patients in the Intensive Care Unit , 2010, J. Biomed. Informatics.

[82]  Guozhu Dong,et al.  Distribution skew-based binning: Towards mining highly discriminative patterns from EEG/EMG time series , 2015, 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE).

[83]  L. Tarassenko,et al.  Dynamic Data During Hypotensive Episode Improves Mortality Predictions Among Patients With Sepsis and Hypotension* , 2013, Critical care medicine.

[84]  Michael R Pinsky,et al.  Hemodynamic evaluation and monitoring in the ICU. , 2007, Chest.

[85]  Naren Ramakrishnan,et al.  Experiences with mining temporal event sequences from electronic medical records: initial successes and some challenges , 2011, KDD.

[86]  Maybin K. Muyeba,et al.  A Framework to Mine High-Level Emerging Patterns by Attribute-Oriented Induction , 2011, IDEAL.

[87]  P. Low,et al.  Prevalence of orthostatic hypotension , 2008, Clinical Autonomic Research.

[88]  Hui Li,et al.  Bone disease prediction and phenotype discovery using feature representation over electronic health records , 2015, BCB.

[89]  Mohammed Saeed,et al.  A Novel Method for the Efficient Retrieval of Similar Multiparameter Physiologic Time Series Using Wavelet-Based Symbolic Representations , 2006, AMIA.

[90]  N. Latronico Prediction is very difficult, especially about the future*. , 2015, Critical care medicine.

[91]  João Miguel da Costa Sousa,et al.  Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients , 2013, Appl. Soft Comput..

[92]  John F. Roddick,et al.  Sequential pattern mining -- approaches and algorithms , 2013, CSUR.

[93]  Jin Chen,et al.  Bearing fault recognition method based on neighbourhood component analysis and coupled hidden Markov model , 2016 .

[94]  Nigel H Lovell,et al.  Non-invasive classification of severe sepsis and systemic inflammatory response syndrome using a nonlinear support vector machine: a preliminary study , 2010, Physiological measurement.

[95]  Daniel A Reuter,et al.  Patient monitoring alarms in the ICU and in the operating room , 2013, Critical Care.

[96]  Mohammad B. Shamsollahi,et al.  Early detection of apnea-bradycardia episodes in preterm infants based on coupled hidden Markov model , 2013, IEEE International Symposium on Signal Processing and Information Technology.

[97]  Vincent S. Tseng,et al.  CBS: A New Classification Method by Using Sequential Patterns , 2005, SDM.

[98]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[99]  Hung T. Nguyen,et al.  Risk Prediction for Acute Hypotensive Patients by Using Gap Constrained Sequential Contrast Patterns , 2014, AMIA.

[100]  Kotagiri Ramamohanarao,et al.  DeEPs: A New Instance-Based Lazy Discovery and Classification System , 2004, Machine Learning.

[101]  Kotagiri Ramamohanarao,et al.  The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms , 2000, ICML.

[102]  Gregory F Cooper,et al.  Conditional outlier detection for clinical alerting. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[103]  Xiao Hu,et al.  Semi-supervised detection of intracranial pressure alarms using waveform dynamics , 2013, Physiological measurement.

[104]  Christos Faloutsos,et al.  DynaMMo: mining and summarization of coevolving sequences with missing values , 2009, KDD.

[105]  Dmitriy Fradkin,et al.  Robust Mining of Time Intervals with Semi-interval Partial Order Patterns , 2010, SDM.