Mining the electronic health record for disease knowledge.

The growing amount and availability of electronic health record (EHR) data present enhanced opportunities for discovering new knowledge about diseases. In the past decade, there has been an increasing number of data and text mining studies focused on the identification of disease associations (e.g., disease-disease, disease-drug, and disease-gene) in structured and unstructured EHR data. This chapter presents a knowledge discovery framework for mining the EHR for disease knowledge and describes each step for data selection, preprocessing, transformation, data mining, and interpretation/validation. Topics including natural language processing, standards, and data privacy and security are also discussed in the context of this framework.

[1]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[2]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[3]  Barry Robson,et al.  Data mining and clinical data repositories: Insights from a 667, 000 patient data set , 2006, Comput. Biol. Medicine.

[4]  Indra Neil Sarkar Methods in biomedical informatics : a pragmatic approach , 2014 .

[5]  Yuan Luo,et al.  Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[6]  Joon Lee,et al.  Accessing the public MIMIC-II intensive care relational database for clinical research , 2013, BMC Medical Informatics and Decision Making.

[7]  Jessica D. Tenenbaum,et al.  Practices and perspectives on building integrated data repositories: results from a 2010 CTSA survey , 2012, J. Am. Medical Informatics Assoc..

[8]  J. Cimino Review Paper: Coding Systems in Health Care , 1995, Methods of Information in Medicine.

[9]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[10]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[11]  Xiaoyan Wang,et al.  Selecting information in electronic health records for knowledge acquisition , 2010, J. Biomed. Informatics.

[12]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[13]  George Hripcsak,et al.  Automated acquisition of disease drug knowledge from biomedical and clinical documents: an initial study. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[14]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[15]  Xiaoyan Wang,et al.  Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[16]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[17]  Howard J. Hamilton,et al.  A fast, on-line generalization algorithm for knowledge discovery , 1995 .

[18]  W. Hammond The making and adoption of health data standards. , 2005, Health affairs.

[19]  Genevieve B. Melton,et al.  Translating standards into practice: Experiences and lessons learned in biomedicine and health care , 2012, J. Biomed. Informatics.

[20]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[21]  George Hripcsak,et al.  Detection of Practice Pattern Trends through Natural Language Processing of Clinical Narratives and Biomedical Literature , 2007, AMIA.

[22]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[23]  Peter Szolovits,et al.  Evaluating the state-of-the-art in automatic de-identification. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[24]  Lehana Thabane,et al.  Application of data mining techniques in pharmacovigilance. , 2003, British journal of clinical pharmacology.

[25]  Carol Friedman,et al.  Mining multi-item drug adverse effect associations in spontaneous reporting systems , 2010, BMC Bioinformatics.

[26]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[27]  Jennifer G. Robinson,et al.  Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[28]  Bradley Malin,et al.  Technical and Policy Approaches to Balancing Patient Privacy and Data Sharing in Clinical and Translational Research , 2010, Journal of Investigative Medicine.

[29]  R. Rabadán,et al.  Discovering Disease Associations by Integrating Electronic Clinical Data and Medical Literature , 2011, PloS one.

[30]  I. Kohane Using electronic health records to drive discovery in disease genomics , 2011, Nature Reviews Genetics.

[31]  Dick Rs,et al.  The Computer-Based Patient Record: Revised Edition: An Essential Technology for Health Care , 1997 .

[32]  S. Meystre,et al.  Automatic de-identification of textual documents in the electronic health record: a review of recent research , 2010, BMC medical research methodology.

[33]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[34]  Jihoon Kim,et al.  iDASH: integrating data for analysis, anonymization, and sharing , 2012, J. Am. Medical Informatics Assoc..

[35]  R Bellazzi,et al.  Mining health care administrative data with temporal association rules on hybrid events. , 2011, Methods of information in medicine.

[36]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[37]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[38]  Chunhua Weng,et al.  Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research , 2013, J. Am. Medical Informatics Assoc..

[39]  Jules J. Berman,et al.  Confidentiality issues for medical data miners , 2002, Artif. Intell. Medicine.

[40]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[41]  Adam Wright,et al.  An automated technique for identifying associations between medications, laboratory results and problems , 2010, J. Biomed. Informatics.

[42]  Joshua C. Denny,et al.  Chapter 13: Mining Electronic Health Records in the Genomics Era , 2012, PLoS Comput. Biol..

[43]  Christopher G. Chute,et al.  BioPortal: ontologies and integrated data resources at the click of a mouse , 2009, Nucleic Acids Res..

[44]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[45]  T. Giordano,et al.  The Health Insurance Portability and Accountability Act of 1996 (HIPAA) privacy rule: implications for clinical research. , 2006, Annual review of medicine.

[46]  Peter A. Bath,et al.  Data mining in health and medical information , 2005, Annu. Rev. Inf. Sci. Technol..

[47]  Deborah A. Nichols,et al.  Strategies for De-identification and Anonymization of Electronic Health Record Data for Use in Multicenter Research Studies , 2012, Medical care.

[48]  R. Ackoff From Data to Wisdom , 2014 .

[49]  Christopher G. Chute,et al.  The Enterprise Data Trust at Mayo Clinic: a semantically integrated warehouse of biomedical data , 2010, J. Am. Medical Informatics Assoc..

[50]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[51]  Isaac S. Kohane,et al.  Strategies for maintaining patient privacy in i2b2 , 2011, J. Am. Medical Informatics Assoc..

[52]  Jolene Galegher,et al.  The Health Insurance Portability and Accountability Act Privacy Rule: A Practical Guide for Researchers , 2004, Medical care.

[53]  David A. Hanauer,et al.  Exploring Clinical Associations Using ‘-Omics’ Based Enrichment Analyses , 2009, PloS one.

[54]  Isaac S. Kohane,et al.  A translational engine at the national scale: informatics for integrating biology and the bedside , 2012, J. Am. Medical Informatics Assoc..

[55]  Andrew R. Post,et al.  Temporal data mining. , 2008, Clinics in laboratory medicine.

[56]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[57]  Shusaku Tsumoto,et al.  Evaluation of rule interestingness measures in medical knowledge discovery in databases , 2007, Artif. Intell. Medicine.

[58]  K. Bretonnel Cohen,et al.  Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[59]  George Hripcsak,et al.  A statistical methodology for analyzing co-occurrence data from a large sample , 2007, J. Biomed. Informatics.

[60]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[61]  George Hripcsak,et al.  Automated encoding of clinical documents based on natural language processing. , 2004, Journal of the American Medical Informatics Association : JAMIA.

[62]  Susan C. Weber,et al.  STRIDE - An Integrated Standards-Based Translational Research Informatics Platform , 2009, AMIA.

[63]  Jessica S. Ancker,et al.  Redesigning electronic health record systems to support public health , 2007, J. Biomed. Informatics.

[64]  D. Roden,et al.  The Emerging Role of Electronic Medical Records in Pharmacogenomics , 2011, Clinical pharmacology and therapeutics.

[65]  D Kalra,et al.  Electronic health records: new opportunities for clinical research , 2013, Journal of internal medicine.

[66]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[67]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[68]  Albert-László Barabási,et al.  A Dynamic Network Approach for the Study of Human Phenotypes , 2009, PLoS Comput. Biol..

[69]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[70]  David W. Bates,et al.  A method and knowledge base for automated inference of patient problems from structured data in an electronic medical record , 2011, J. Am. Medical Informatics Assoc..

[71]  Douglas MacFadden,et al.  SHRINE: Enabling Nationally Scalable Multi-Site Disease Studies , 2013, PloS one.

[72]  Xiaoyan Wang,et al.  Characterizing environmental and phenotypic associations using information theory and electronic health records , 2009, BMC Bioinformatics.

[73]  Daniel J. Vreeman,et al.  LOINC®: a universal catalogue of individual clinical observations and uniform representation of enumerated collections , 2010, Int. J. Funct. Informatics Pers. Medicine.

[74]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[75]  Carol Friedman,et al.  A broad-coverage natural language processing system , 2000, AMIA.

[76]  Søren Brunak,et al.  Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts , 2011, PLoS Comput. Biol..

[77]  D C Torney,et al.  Discovery of association rules in medical data , 2001, Medical informatics and the Internet in medicine.

[78]  Hongfang Liu,et al.  Representing information in patient reports using natural language processing and the extensible markup language. , 1999, Journal of the American Medical Informatics Association : JAMIA.

[79]  George Hripcsak,et al.  Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics , 2005, AMIA.

[80]  D. Roden,et al.  Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[81]  Rajeev Krishna,et al.  Patient confidentiality in the research use of clinical medical databases. , 2007, American journal of public health.

[82]  Geraldine P Mineau,et al.  Biomedical databases: protecting privacy and promoting research. , 2003, Trends in biotechnology.

[83]  Erik M. van Mulligen,et al.  Databases for knowledge discovery: Examples from biomedicine and health care , 2006, Int. J. Medical Informatics.

[84]  G Hripcsak,et al.  Biclustering of Adverse Drug Events in the FDA's Spontaneous Reporting System , 2011, Clinical pharmacology and therapeutics.

[85]  J J Cimino,et al.  The Practical Impact of Ontologies on Biomedical Informatics , 2006, Yearbook of Medical Informatics.

[86]  Blaz Zupan,et al.  Open-source tools for data mining. , 2008, Clinics in laboratory medicine.

[87]  James M. Walker,et al.  Bridging the inferential gap: the electronic health record and clinical evidence. , 2007, Health affairs.

[88]  Stuart J. Nelson,et al.  Normalized names for clinical drugs: RxNorm at 6 years , 2011, J. Am. Medical Informatics Assoc..

[89]  Philip R. O. Payne,et al.  TRIAD: The Translational Research Informatics and Data Management Grid , 2011, Applied Clinical Informatics.

[90]  James J. Cimino,et al.  The Clinical Research Data Repository of the US National Institutes of Health , 2010, MedInfo.