Predictive data mining in clinical medicine: a focus on selected methods and applications

Predictive data mining in clinical medicine deals with learning models to predict patients' health. The models can be devoted to support clinicians in diagnostic, therapeutic, or monitoring tasks. Data mining methods are usually applied in clinical contexts to analyze retrospective data, thus giving healthcare professionals the opportunity to exploit large amounts of data routinely collected during their day‐by‐day activity. Moreover, clinicians can nowadays take advantage of data mining techniques to deal with the huge amount of research results obtained by molecular medicine, such as genetic or genomic signatures, which may allow transition from population‐based to personalized medicine. The current challenge is to exploit data mining to build models able to take into account the dynamic and temporal nature of clinical care and to exploit the variety of information available at the bedside. This review describes the main features of predictive clinical data mining and focus on two specific aspects of particular interest: the methods able to deal with temporal data and the efforts performed to translate molecular medicine results into clinically useful data mining models. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 416–430 DOI: 10.1002/widm.23

[1]  B. Löwenberg,et al.  A decade of genome-wide gene expression profiling in acute myeloid leukemia: flashback and prospects. , 2009, Blood.

[2]  W. Gradishar,et al.  New Molecular Classifications of Breast Cancer , 2009, CA: a cancer journal for clinicians.

[3]  Bhaskar D. Kulkarni,et al.  Arrhythmia Classification Using Local Hölder Exponents and Support Vector Machine , 2005, PReMI.

[4]  Angelo Nuzzo,et al.  Phenotype forecasting with SNPs data through gene-based Bayesian networks , 2009, BMC Bioinformatics.

[5]  Riccardo Bellazzi,et al.  Intelligent analysis of clinical time series: an application in the diabetes mellitus domain , 2000, Artif. Intell. Medicine.

[6]  Heidrun Schumann,et al.  Visual Methods for Analyzing Time-Oriented Data , 2008, IEEE Transactions on Visualization and Computer Graphics.

[7]  Edward H. Shortliffe,et al.  JBI status report , 2002, J. Biomed. Informatics.

[8]  A. Witteveen,et al.  Converting a breast cancer microarray signature into a high-throughput diagnostic test , 2006, BMC Genomics.

[9]  D. Sackett,et al.  Evidence based medicine: what it is and what it isn't , 1996, BMJ.

[10]  Tom Brijs,et al.  Discovering during-temporal patterns (DTPs) in large temporal databases , 2008, Expert Syst. Appl..

[11]  A Burgun,et al.  Accessing and Integrating Data and Knowledge for Biomedical Research , 2008, Yearbook of Medical Informatics.

[12]  Rachel Badovinac Ramoni,et al.  Predictive genomics of cardioembolic stroke. , 2009, Stroke.

[13]  Yuval Shahar,et al.  Knowledge-based temporal abstraction in clinical domains , 1996, Artif. Intell. Medicine.

[14]  Juan Carlos Augusto,et al.  Temporal reasoning for decision support in medicine , 2005, Artif. Intell. Medicine.

[15]  John E Niederhuber,et al.  Translating discovery to patient care. , 2010, JAMA.

[16]  John F. Roddick,et al.  A Survey of Temporal Knowledge Discovery Paradigms and Methods , 2002, IEEE Trans. Knowl. Data Eng..

[17]  Silvia Miksch,et al.  Improving the Execution of Clinical Guidelines and Temporal Data Abstraction High-Frequency Domains , 2008, Computer-based Medical Guidelines and Protocols.

[18]  Evert de Jonge,et al.  Temporal abstraction for feature extraction: A comparative case study in prediction from intensive care monitoring data , 2007, Artif. Intell. Medicine.

[19]  Yuval Shahar,et al.  Medical Temporal-Knowledge Discovery via Temporal Abstraction , 2009, AMIA.

[20]  Dursun Delen,et al.  A machine learning-based approach to prognostic analysis of thoracic transplantations , 2010, Artif. Intell. Medicine.

[21]  Marie-Odile Cordier,et al.  Temporal abstraction and inductive logic programming for arrhythmia recognition from electrocardiograms , 2003, Artif. Intell. Medicine.

[22]  G. Grisetti,et al.  Further Reading , 1984, IEEE Spectrum.

[23]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[24]  Carolyn McGregor,et al.  Temporal abstraction in intelligent clinical data analysis: A survey , 2007, Artif. Intell. Medicine.

[25]  Mong-Li Lee,et al.  Mining relationships among interval-based events for classification , 2008, SIGMOD Conference.

[26]  L. V. van't Veer,et al.  Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. , 2006, Journal of the National Cancer Institute.

[27]  Fabian Mörchen,et al.  Efficient mining of understandable patterns from multivariate interval time series , 2007, Data Mining and Knowledge Discovery.

[28]  J. Stoker,et al.  The Department of Health and Human Services. , 1999, Home healthcare nurse.

[29]  Peter Bühlmann,et al.  Mining Tissue Microarray Data to Uncover Combinations of Biomarker Expression Patterns that Improve Intermediate Staging and Grading of Clear Cell Renal Cell Cancer , 2009, Clinical Cancer Research.

[30]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[31]  Donna K. Slonim,et al.  Getting Started in Gene Expression Microarray Analysis , 2009, PLoS Comput. Biol..

[32]  F Pinciroli,et al.  Managing Different Time Granularities of Clinical Information by an Interval-based Temporal Data Model , 1995, Methods of Information in Medicine.

[33]  C. Desmedt,et al.  Gene expression predictors in breast cancer: current status, limitations and perspectives. , 2008, European journal of cancer.

[34]  Bernhard Pfeifer,et al.  A new data mining approach for profiling and categorizing kinetic patterns of metabolic biomarkers after myocardial injury , 2010, Bioinform..

[35]  Elpida T. Keravnou,et al.  Temporal representation and reasoning in medicine: Research directions and challenges , 2006, Artif. Intell. Medicine.

[36]  Milos Hauskrecht,et al.  Multivariate Time Series Classification with Temporal Abstractions , 2009, FLAIRS.

[37]  Frank Klawonn,et al.  Finding informative rules in interval sequences , 2001, Intell. Data Anal..

[38]  Riccardo Bellazzi,et al.  Data Mining Technologies for Blood Glucose and Diabetes Management , 2009, Journal of diabetes science and technology.

[39]  Manuel Filipe Santos,et al.  Rating organ failure via adverse events using data mining in the intensive care unit , 2008, Artif. Intell. Medicine.

[40]  Seyed Kamaledin Setarehdan,et al.  Support vector machine-based arrhythmia classification using reduced features of heart rate variability signal , 2008, Artif. Intell. Medicine.

[41]  Andrew R. Post,et al.  Temporal data mining. , 2008, Clinics in laboratory medicine.

[42]  L. V. van't Veer,et al.  Clinical application of the 70-gene profile: the MINDACT trial. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[43]  Yuval Shahar,et al.  Distributed, intelligent, interactive visualization and exploration of time-oriented clinical data and their abstractions , 2006, Artif. Intell. Medicine.

[44]  A. Abu-Hanna,et al.  Evaluation of SOFA-based models for predicting mortality in the ICU: A systematic review , 2008, Critical care.

[45]  Andrew R. Post,et al.  Model Formulation: PROTEMPA: A Method for Specifying and Identifying Temporal Sequences in Retrospective Data for Patient Selection , 2007, J. Am. Medical Informatics Assoc..

[46]  John F. Roddick,et al.  ARMADA - An algorithm for discovering richer relative temporal association rules from interval-based data , 2007, Data Knowl. Eng..

[47]  Ramasamy Uthurusamy,et al.  Data mining and knowledge discovery in databases , 1996, CACM.

[48]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[49]  Paola Sebastiani,et al.  Genome‐wide association studies and the genetic dissection of complex traits , 2009, American journal of hematology.

[50]  Yuval Shahar,et al.  Temporal Information Systems in Medicine , 2010 .

[51]  Andrew R. Post,et al.  Abstraction-based Temporal Data Retrieval for a Clinical Data Repository , 2007, AMIA.

[52]  A Abu-Hanna,et al.  A subgroup discovery approach for scrutinizing blood glucose management guidelines by the identification of hyperglycemia determinants in ICU patients. , 2008, Methods of information in medicine.

[53]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[54]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 2004, Data Mining and Knowledge Discovery.

[55]  Cesare Furlanello,et al.  Machine learning methods for predictive proteomics , 2007, Briefings Bioinform..

[56]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[57]  G. Niklas Norén,et al.  Temporal pattern discovery in longitudinal electronic patient records , 2010, Data Mining and Knowledge Discovery.

[58]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[59]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[60]  Massimo Franceschet,et al.  Representing and Reasoning about Temporal Granularities , 2004, J. Log. Comput..

[61]  Jonathan M. Dreyfuss,et al.  Integrative Predictive Model of Coronary Artery Calcification in Atherosclerosis , 2009, Circulation.

[62]  Tu Bao Ho,et al.  Temporal Abstraction and Data Mining with Visualization of Laboratory Data , 2007, MedInfo.

[63]  A. Nobel,et al.  The molecular portraits of breast tumors are conserved across microarray platforms , 2006, BMC Genomics.

[64]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[65]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.

[66]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[67]  Martin J. O'Connor,et al.  An Ontology-Driven Mediator for Querying Time-Oriented Biomedical Data , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[68]  BellazziRiccardo,et al.  Predictive data mining in clinical medicine , 2011 .

[69]  Hui Zhang,et al.  Feature Extraction for Time Series Classification Using Discriminating Wavelet Coefficients , 2006, ISNN.

[70]  A. Dupuy,et al.  Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. , 2007, Journal of the National Cancer Institute.

[71]  Arnoldo Frigessi,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm305 Gene expression Predicting survival from microarray data—a comparative study , 2022 .

[72]  Paola Sebastiani,et al.  Genetic dissection and prognostic modeling of overt stroke in sickle cell anemia , 2005, Nature Genetics.

[73]  Riccardo Bellazzi,et al.  Temporal data mining for the quality assessment of hemodialysis services , 2005, Artif. Intell. Medicine.

[74]  Yang-Chu Lin,et al.  Obesity and the decision tree: predictors of sustained weight loss after bariatric surgery. , 2009, Hepato-gastroenterology.

[75]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[76]  Catherine Garbay,et al.  Knowledge construction from time series data using a collaborative exploration system , 2007, J. Biomed. Informatics.

[77]  C. Hoggart,et al.  Simultaneous Analysis of All SNPs in Genome-Wide and Re-Sequencing Association Studies , 2008, PLoS genetics.

[78]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[79]  Riccardo Bellazzi,et al.  Temporal Data Mining for the Assessment of the Costs Related to Diabetes Mellitus Pharmacological Treatment , 2009, AMIA.

[80]  Trevor J. Hastie,et al.  Genome-wide association analysis by lasso penalized logistic regression , 2009, Bioinform..

[81]  Mir S. Siadaty,et al.  Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method , 2006, BMC Medical Informatics Decis. Mak..

[82]  F. Collins Has the revolution arrived? , 2010, Nature.

[83]  Nada Lavrac,et al.  Data Mining in Medicine , 2010, Data Mining and Knowledge Discovery Handbook.

[84]  Dursun Delen,et al.  Predicting the graft survival for heart-lung transplantation patients: An integrated data mining methodology , 2009, Int. J. Medical Informatics.

[85]  J. Haerting,et al.  Gene-expression signatures in breast cancer. , 2003, The New England journal of medicine.

[86]  Mooi Choo Chuah,et al.  ECG Anomaly Detection via Time Series Analysis , 2007, ISPA Workshops.

[87]  Yuval Shahar,et al.  A Knowledge-Based Time-Oriented Active Database Approach for Intelligent Abstraction, Querying and Continuous Monitoring of Clinical Data , 2004, MedInfo.

[88]  Riccardo Bellazzi,et al.  A hierarchical Naïve Bayes Model for handling sample heterogeneity in classification problems: an application to tissue microarrays , 2006, BMC Bioinformatics.

[89]  R Brian Haynes,et al.  Evidence based medicine: what it is and what it isn't. 1996. , 2007, Clinical orthopaedics and related research.

[90]  Dario Gregori,et al.  Non-invasive risk stratification of coronary artery disease: an evaluation of some commonly used statistical classifiers in terms of predictive accuracy and clinical usefulness. , 2009, Journal of evaluation in clinical practice.

[91]  Shusaku Tsumoto,et al.  Evaluation of rule interestingness measures in medical knowledge discovery in databases , 2007, Artif. Intell. Medicine.

[92]  Rachel Badovinac Ramoni,et al.  A Testable Prognostic Model of Nicotine Dependence , 2009, Journal of neurogenetics.

[93]  W. Art Chaovalitwongse,et al.  Electroencephalogram (EEG) time series classification: Applications in epilepsy , 2006, Ann. Oper. Res..

[94]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[95]  Yuval Shahar,et al.  A Framework for Knowledge-Based Temporal Abstraction , 1997, Artif. Intell..

[96]  Carlo Combi,et al.  Data mining with Temporal Abstractions: learning rules from time series , 2007, Data Mining and Knowledge Discovery.

[97]  R. Bellazzi,et al.  Methods and tools for mining multivariate temporal data in clinical and biomedical applications , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[98]  Samson W. Tu,et al.  Using an Integrated Ontology and Information Model for Querying and Reasoning about Phenotypes: The Case of Autism , 2008, AMIA.

[99]  John Eberhardt,et al.  Application of multivariate probabilistic (Bayesian) networks to substance use disorder risk stratification and cost estimation. , 2009, Perspectives in health information management.

[100]  Yuval Shahar,et al.  Intelligent visualization and exploration of time-oriented data of multiple patients , 2010, Artif. Intell. Medicine.

[101]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[102]  Martin J. O'Connor,et al.  An Ontology-Driven Method for Hierarchical Mining of Temporal Patterns: Application to HIV Drug Resistance Research , 2007, AMIA.

[103]  Ada Wai-Chee Fu,et al.  Discovering Temporal Patterns for Interval-Based Events , 2000, DaWaK.

[104]  Michael G. Strintzis,et al.  ECG analysis using nonlinear PCA neural networks for ischemia detection , 1998, IEEE Trans. Signal Process..

[105]  Carolyn McGregor,et al.  Multi-dimensional temporal abstraction and data mining of medical time series data: Trends and challenges , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[106]  Riccardo Bellazzi,et al.  Mining Administrative and Clinical Diabetes Data with Temporal Association Rules , 2009, MIE.

[107]  Jie Chen,et al.  Mining Unexpected Temporal Associations: Applications in Detecting Adverse Drug Reactions , 2008, IEEE Transactions on Information Technology in Biomedicine.

[108]  D. Thomas,et al.  Genome‐wide association studies for discrete traits , 2009, Genetic epidemiology.

[109]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[110]  Jie Chen,et al.  Mining risk patterns in medical data , 2005, KDD '05.

[111]  Régis Beuscart,et al.  Data-Mining-Based Detection of Adverse Drug Events , 2009, MIE.

[112]  M. McCarthy,et al.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges , 2008, Nature Reviews Genetics.

[113]  Karsten Sternickel,et al.  Automatic pattern recognition in ECG time series , 2002, Comput. Methods Programs Biomed..

[114]  Thusitha De Silva Mabotuwana,et al.  An ontology-based approach to enhance querying capabilities of general practice medicine for better management of hypertension , 2009, Artif. Intell. Medicine.