Mining electronic health records: towards better research applications and clinical care

Clinical data describing the phenotypes and treatment of patients represents an underused data source that has much greater research potential than is currently realized. Mining of electronic health records (EHRs) has the potential for establishing new patient-stratification principles and for revealing unknown disease correlations. Integrating EHR data with genetic data will also give a finer understanding of genotype–phenotype relationships. However, a broad range of ethical, legal and technical reasons currently hinder the systematic deposition of these data in EHRs and their mining. Here, we consider the potential for furthering medical research and clinical care using EHR data and the challenges that must be overcome before this is a reality.

[1]  L. Martin Transformations of Variables in Clinical-Therapeutical Research , 1962, Methods of Information in Medicine.

[2]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[3]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[4]  T. Clemmer,et al.  A computer-assisted management program for antibiotics and other antiinfective agents. , 1998, The New England journal of medicine.

[5]  Nada Lavrac,et al.  Selected techniques for data mining in medicine , 1999, Artif. Intell. Medicine.

[6]  L Frank,et al.  When an Entire Country Is a Cohort , 2000, Science.

[7]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[8]  Bradley Malin,et al.  Re-identification of DNA through an automated linkage process , 2001, AMIA.

[9]  Mattias Ohlsson,et al.  Using Hidden Markov Models to Characterize Disease Trajectories , 2001 .

[10]  Kunio Doi,et al.  Computerized detection of pulmonary embolism in spiral CT angiography based on volumetric image analysis , 2002, IEEE Transactions on Medical Imaging.

[11]  M. Rosén,et al.  Have DRG-based prospective payment systems influenced the number of secondary diagnoses in health care administrative data? , 2003, Health policy.

[12]  L. Bouter,et al.  How to measure comorbidity. a critical review of available methods. , 2003, Journal of clinical epidemiology.

[13]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[14]  Michael E Phelps,et al.  Systems Biology and New Technologies Enable Predictive and Preventative Medicine , 2004, Science.

[15]  U Sax,et al.  Integration of Genomic Data in Electronic Health Records , 2005, Methods of Information in Medicine.

[16]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[17]  Robin C. Meili,et al.  Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. , 2005, Health affairs.

[18]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[19]  George Hripcsak,et al.  Inter-patient distance metrics using SNOMED CT defining relationships , 2006, J. Biomed. Informatics.

[20]  Barry Robson,et al.  Data mining and clinical data repositories: Insights from a 667, 000 patient data set , 2006, Comput. Biol. Medicine.

[21]  Sebastian Garde,et al.  Towards Semantic Interoperability for Electronic Health Records , 2007, Methods of Information in Medicine.

[22]  S. Hayat Clinical Differences Between Idiopathic and Scleroderma-Related Pulmonary Hypertension , 2007 .

[23]  A. Begoyan,et al.  AN OVERVIEW OF INTEROPERABILITY STANDARDS FOR ELECTRONIC HEALTH RECORDS , 2007 .

[24]  A. Rzhetsky,et al.  Probing genetic overlap among complex human phenotypes , 2007, Proceedings of the National Academy of Sciences.

[25]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[26]  James M. Walker,et al.  Bridging the inferential gap: the electronic health record and clinical evidence. , 2007, Health affairs.

[27]  M. A. Hoffman,et al.  The genome-enabled electronic medical record , 2007, J. Biomed. Informatics.

[28]  Christian Lovis,et al.  Section 2: Patient Records: Electronic Patient Records: Moving from Islands and Bridges towards Electronic Health Records for Continuity of Care , 2007, Yearbook of Medical Informatics.

[29]  R. Kush,et al.  Electronic health records, medical research, and the Tower of Babel. , 2008, The New England journal of medicine.

[30]  Dan M Roden,et al.  Genetic determinants of response to warfarin during initial anticoagulation. , 2008, The New England journal of medicine.

[31]  N. Khardori,et al.  Timing of Specimen Collection for Blood Cultures from Febrile Patients with Bacteremia , 2008 .

[32]  George Hripcsak,et al.  Use abstracted patient-specific features to assist an information-theoretic measurement to assess similarity between medical cases , 2008, J. Biomed. Informatics.

[33]  A. Hunter,et al.  The Innovative Medicines Initiative: a pre-competitive initiative to enhance the biomedical science base of Europe to expedite the development of new medicines for patients. , 2008, Drug discovery today.

[34]  Mevlut Ture,et al.  Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease , 2008, Expert Syst. Appl..

[35]  Patrick L. Taylor Personal Genomes: When consent gets in the way , 2008, Nature.

[36]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[37]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[38]  C. McCarty,et al.  Estrogen receptor genotype is associated with risk of venous thromboembolism during tamoxifen therapy , 2009, Breast Cancer Research and Treatment.

[39]  Yuan Luo,et al.  Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[40]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[41]  D. Roden,et al.  Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[42]  Peter D. Stetson,et al.  Model Formulation: An Electronic Health Record Based on Structured Narrative , 2008, J. Am. Medical Informatics Assoc..

[43]  Bruce E. Bray,et al.  RxTerms - a drug interface terminology derived from RxNorm , 2008, AMIA.

[44]  A. Barabasi,et al.  Molecular Systems Biology 5; Article number 262; doi:10.1038/msb.2009.16 Citation: Molecular Systems Biology 5:262 , 2022 .

[45]  Xiaoyan Wang,et al.  Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[46]  M. Hall Property, Privacy and the Pursuit of Integrated Electronic Medical Records , 2009 .

[47]  H. Prokosch,et al.  Perspectives for Medical Informatics , 2009, Methods of Information in Medicine.

[48]  E. Coiera,et al.  Research Paper: Building a National Health IT System from the Middle Out , 2009, J. Am. Medical Informatics Assoc..

[49]  Clement J. McDonald,et al.  What can natural language processing do for clinical decision support? , 2009, J. Biomed. Informatics.

[50]  A. Barabasi,et al.  Cancer metastasis networks and the prediction of progression patterns , 2009, British Journal of Cancer.

[51]  I. Kohane,et al.  Instrumenting the health care enterprise for discovery research in the genomic era. , 2009, Genome research.

[52]  David J. Galas,et al.  Systems Biology and Emerging Technologies Will Catalyze the Transition from Reactive Medicine to Predictive , Personalized , Preventive and Participatory ( P 4 ) Medicine , 2009 .

[53]  H. Boyd,et al.  Recurrence of Congenital Heart Defects in Families , 2009, Circulation.

[54]  Jenny Donovan,et al.  Feasibility and cost of obtaining informed consent for essential review of medical records in large-scale health services research , 2009, Journal of health services research & policy.

[55]  Albert-László Barabási,et al.  A Dynamic Network Approach for the Study of Human Phenotypes , 2009, PLoS Comput. Biol..

[56]  Özlem Uzuner,et al.  Viewpoint Paper: Recognizing Obesity and Comorbidities in Sparse Data , 2009, J. Am. Medical Informatics Assoc..

[57]  Melissa C Brouwers,et al.  Written informed consent and selection bias in observational studies using medical records: systematic review , 2009, BMJ : British Medical Journal.

[58]  I. Sarkar Biomedical informatics and translational medicine , 2010, Journal of Translational Medicine.

[59]  Dursun Delen,et al.  Predicting the graft survival for heart-lung transplantation patients: An integrated data mining methodology , 2009, Int. J. Medical Informatics.

[60]  David J. Galas,et al.  Systems Biology and Emerging Technologies Will Catalyze the Transition from Reactive Medicine to Predictive, Personalized, Preventive and Participatory (P4) Medicine , 2009 .

[61]  David A. Hanauer,et al.  Exploring Clinical Associations Using ‘-Omics’ Based Enrichment Analyses , 2009, PloS one.

[62]  E. Lopez-Gonzalez,et al.  Determinants of Under-Reporting of Adverse Drug Reactions , 2009, Drug safety.

[63]  Jane S. Paulsen,et al.  Perceptions of genetic discrimination among people at risk for Huntington’s disease: a cross sectional survey , 2009, BMJ : British Medical Journal.

[64]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[65]  N. Hawkins,et al.  Assessing the Privacy Risks of Data Sharing in Genomics , 2010, Public Health Genomics.

[66]  Robert D Gibbons,et al.  Post-approval drug safety surveillance. , 2010, Annual review of public health.

[67]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[68]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[69]  Anneke T. M. Goossen-Baremans,et al.  Detailed Clinical Models: A Review , 2010, Healthcare informatics research.

[70]  C. DesRoches,et al.  A progress report on electronic health records in U.S. hospitals. , 2010, Health affairs.

[71]  Francisco S. Roque,et al.  Dissecting spatio-temporal protein networks driving human heart development and related disorders , 2010, Molecular systems biology.

[72]  S. R. Batlouni,et al.  A New and Fast Technique to Generate Offspring after Germ Cells Transplantation in Adult Fish: The Nile Tilapia (Oreochromis niloticus) Model , 2010, PloS one.

[73]  S. Meystre,et al.  Automatic de-identification of textual documents in the electronic health record: a review of recent research , 2010, BMC medical research methodology.

[74]  Carol Friedman,et al.  Mining multi-item drug adverse effect associations in spontaneous reporting systems , 2010, BMC Bioinformatics.

[75]  Steffie Woolhandler,et al.  Hospital computing and the costs and quality of care: a national study. , 2010, The American journal of medicine.

[76]  Adam Wright,et al.  An automated technique for identifying associations between medications, laboratory results and problems , 2010, J. Biomed. Informatics.

[77]  Alexander A. Morgan,et al.  Clinical assessment incorporating a personal genome , 2010, The Lancet.

[78]  Christopher G. Chute,et al.  An analytical approach to characterize morbidity profile dissimilarity between distinct cohorts using electronic medical records , 2010, J. Biomed. Informatics.

[79]  S. Marsh,et al.  Integrating genomic-based clinical decision support into electronic health records. , 2010, Personalized medicine.

[80]  Melissa A. Basford,et al.  Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. , 2010, American journal of human genetics.

[81]  Bradley Malin,et al.  Evaluating re-identification risks with respect to the HIPAA privacy rule , 2010, J. Am. Medical Informatics Assoc..

[82]  A. Ismaila,et al.  A tutorial on pilot studies: the what, why and how , 2010, BMC Medical Research Methodology.

[83]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[84]  D. Blumenthal Launching HITECH. , 2010, The New England journal of medicine.

[85]  Christopher G. Chute,et al.  A Genome-Wide Association Study of Red Blood Cell Traits Using the Electronic Medical Record , 2010, PloS one.

[86]  M. Rothstein Is Deidentification Sufficient to Protect Health Privacy in Research? , 2010, The American journal of bioethics : AJOB.

[87]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[88]  Steven J. Steindel,et al.  International classification of diseases, 10th edition, clinical modification and procedure coding system: descriptive overview of the next generation HIPAA code sets , 2010, J. Am. Medical Informatics Assoc..

[89]  Wendy A. Wolf,et al.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies , 2011, BMC Medical Genomics.

[90]  Russell A Wilke,et al.  Biobanking and pharmacogenomics. , 2010, Pharmacogenomics.

[91]  D. Blumenthal,et al.  The benefits of health information technology: a review of the recent literature shows predominantly positive results. , 2011, Health affairs.

[92]  Sharon I. Greenblum,et al.  Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease , 2011, Proceedings of the National Academy of Sciences.

[93]  D. Roden,et al.  The Emerging Role of Electronic Medical Records in Pharmacogenomics , 2011, Clinical pharmacology and therapeutics.

[94]  Christopher G Chute,et al.  Complement receptor 1 gene variants are associated with erythrocyte sedimentation rate. , 2011, American journal of human genetics.

[95]  M. Massagli,et al.  Accelerated clinical discovery using self-reported patient data collected online and a patient-matching algorithm , 2011, Nature Biotechnology.

[96]  R. Rabadán,et al.  Discovering Disease Associations by Integrating Electronic Clinical Data and Medical Literature , 2011, PloS one.

[97]  Tak-Chung Fu,et al.  A review on time series data mining , 2011, Eng. Appl. Artif. Intell..

[98]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[99]  A. Sheikh,et al.  Understanding Contrasting Approaches to Nationwide Implementations of Electronic Health Record Systems: England, the USA and Australia , 2011 .

[100]  Lucila Ohno-Machado,et al.  Realizing the full potential of electronic health records: the role of natural language processing , 2011, J. Am. Medical Informatics Assoc..

[101]  Hua Xu,et al.  Data from clinical notes: a perspective on the tension between structure and flexible documentation , 2011, J. Am. Medical Informatics Assoc..

[102]  M. Fava,et al.  Using electronic medical records to enable large-scale studies in psychiatry: treatment resistant depression as a model , 2011, Psychological Medicine.

[103]  Melissa A. Basford,et al.  Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. , 2011, American journal of human genetics.

[104]  L. Thygesen,et al.  Introduction to Danish (nationwide) registers on health and social issues: Structure, access, legislation, and archiving , 2011, Scandinavian journal of public health.

[105]  C. Chute,et al.  Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium , 2011, Science Translational Medicine.

[106]  Peter Szolovits,et al.  Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. , 2011, American journal of human genetics.

[107]  M. Schuemie,et al.  Combining electronic healthcare databases in Europe to allow for large‐scale drug safety monitoring: the EU‐ADR Project , 2011, Pharmacoepidemiology and drug safety.

[108]  I. Kohane Using electronic health records to drive discovery in disease genomics , 2011, Nature Reviews Genetics.

[109]  Lixia Yao,et al.  Electronic health records: Implications for drug discovery. , 2011, Drug discovery today.

[110]  Søren Brunak,et al.  Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts , 2011, PLoS Comput. Biol..

[111]  Riccardo Bellazzi,et al.  Predictive data mining in clinical medicine: a focus on selected methods and applications , 2011, WIREs Data Mining Knowl. Discov..

[112]  P. Bork,et al.  Enterotypes of the human gut microbiome , 2011, Nature.

[113]  Sophia Ananiadou,et al.  Discovering and visualizing indirect associations between biomedical concepts , 2011, Bioinform..

[114]  Suzette J. Bielinski,et al.  Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study , 2012, J. Am. Medical Informatics Assoc..

[115]  Andy Podgurski,et al.  Balancing Privacy, Autonomy, and Scientific Needs In Electronic Health Records Research. , 2012, SMU law review : a publication of Southern Methodist University School of Law.

[116]  S. Schulz,et al.  Systematized Nomenclature of Medicine Clinical Terms ( SNOMED CT ) , 2021 .