Extracting research-quality phenotypes from electronic health records to support precision medicine

The convergence of two rapidly developing technologies - high-throughput genotyping and electronic health records (EHRs) - gives scientists an unprecedented opportunity to utilize routine healthcare data to accelerate genomic discovery. Institutions and healthcare systems have been building EHR-linked DNA biobanks to enable such a vision. However, the precise extraction of detailed disease and drug-response phenotype information hidden in EHRs is not an easy task. EHR-based studies have successfully replicated known associations, made new discoveries for diseases and drug response traits, rapidly contributed cases and controls to large meta-analyses, and demonstrated the potential of EHRs for broad-based phenome-wide association studies. In this review, we summarize the advantages and challenges of repurposing EHR data for genetic research. We also highlight recent notable studies and novel approaches to provide an overview of advanced EHR-based phenotyping.

[1]  Benefits and obstacles for hospital executives of the electronic medical record. , 1993, Healthcare information management : journal of the Healthcare Information and Management Systems Society of the American Hospital Association.

[2]  An electronic medical record--delivering benefits today. , 1993 .

[3]  J C Bailar,et al.  The practice of meta-analysis. , 1995, Journal of clinical epidemiology.

[4]  M Leipzig Implementing an electronic medical record system in ambulatory care. , 1996, Hospital technology series.

[5]  W. Hogan,et al.  The accuracy of medication data in an outpatient electronic medical record. , 1996, Journal of the American Medical Informatics Association : JAMIA.

[6]  L Sweeney Privacy and medical-records research. , 1998, The New England journal of medicine.

[7]  A S Sado,et al.  Electronic medical record in the intensive care unit. , 1999, Critical care clinics.

[8]  H J Lowe,et al.  The electronic medical record. A randomized trial of its impact on primary care physicians' initial management of major depression [corrected]. , 2001, Archives of internal medicine.

[9]  Henry J. Lowe,et al.  The electronic medical record. A randomized trial of its impact on primary care physicians' initial management of major depression [corrected]. , 2001, Archives of internal medicine.

[10]  David L. Schriger,et al.  Implementation of clinical guidelines through an electronic medical record: physician usage, satisfaction and assessment , 2001, Int. J. Medical Informatics.

[11]  E. Ewen,et al.  Impact of an electronic medical record on quality of care in a primary care office. , 2001, Delaware medical journal.

[12]  Karen Laing,et al.  The Benefits and Challenges of the Computerized Electronic Medical Record , 2002, Gastroenterology nursing : the official journal of the Society of Gastroenterology Nurses and Associates.

[13]  V. Bufalino,et al.  Effectiveness of the Electronic Medical Record in Improving the Management of Hypertension , 2002, Journal of clinical hypertension.

[14]  Anderson Spickard,et al.  Research Paper: "Understanding" Medical School Curriculum Content Using KnowledgeMap , 2003, J. Am. Medical Informatics Assoc..

[15]  R. Triendl Japan launches controversial Biobank project , 2003, Nature Medicine.

[16]  J. Gulcher,et al.  A populationwide coalescent analysis of Icelandic matrilineal and patrilineal genealogies: evidence for a faster evolutionary rate of mtDNA lineages than Y chromosomes. , 2003, American journal of human genetics.

[17]  Yuan-Tsong Chen,et al.  A marker for Stevens–Johnson syndrome , 2004 .

[18]  R. Collins,et al.  Cohort profile: the Kadoorie Study of Chronic Disease in China (KSCDC). , 2005, International journal of epidemiology.

[19]  John F. Hurdle,et al.  Measuring diagnoses: ICD code accuracy. , 2005, Health services research.

[20]  J. Gilbert,et al.  Complement Factor H Variant Increases the Risk of Age-Related Macular Degeneration , 2005, Science.

[21]  J. Westfall,et al.  Missing clinical information during primary care visits. , 2005, JAMA.

[22]  Thomas G Rundall,et al.  Kaiser Permanente's experience of implementing an electronic medical record: a qualitative study , 2005, BMJ : British Medical Journal.

[23]  Gari D. Clifford,et al.  Shortliffe Edward H, Cimino James J: "Biomedical Informatics; Computer Applications in Health Care and Biomedicine" , 2006 .

[24]  Christos Lionis,et al.  Implementation of an electronic medical record system in previously computer-naïve primary care centres: a pilot study from Cyprus. , 2007, Informatics in primary care.

[25]  A. Rzhetsky,et al.  Probing genetic overlap among complex human phenotypes , 2007, Proceedings of the National Academy of Sciences.

[26]  Pär Stattin,et al.  Nordic biological specimen banks as basis for studies of cancer causes and control – more than 2 million sample donors, 25 million person years and 100 000 prospective cancers , 2007, Acta oncologica.

[27]  Joan F. Bachenheimer,et al.  Reinventing Patient Recruitment: Revolutionary Ideas for Clinical Trial Success , 2007 .

[28]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[29]  Yusuke Nakamura,et al.  The BioBank Japan Project. , 2007, Clinical advances in hematology & oncology : H&O.

[30]  Yan Z. Heras,et al.  Clinical Element Model , 2008 .

[31]  K. Mossman The Wellcome Trust Case Control Consortium, U.K. , 2008 .

[32]  Norman Fost,et al.  Community consultation and communication for a population‐based DNA biobank: The Marshfield clinic personalized medicine research project , 2008, American journal of medical genetics. Part A.

[33]  Atul J. Butte,et al.  Novel Integration of Hospital Electronic Medical Records and Gene Expression Measurements to Identify Genetic Markers of Maturation , 2007, Pacific Symposium on Biocomputing.

[34]  C. McCarty,et al.  Estrogen receptor genotype is associated with risk of venous thromboembolism during tamoxifen therapy , 2009, Breast Cancer Research and Treatment.

[35]  David J. Carey,et al.  Association of chromosome 9p21 SNPs with cardiovascular phenotypes in morbid obesity using electronic health record data , 2008, Genomic Medicine.

[36]  Yuan Luo,et al.  Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[37]  D. Roden,et al.  Development of a Large‐Scale De‐Identified DNA Biobank to Enable Personalized Medicine , 2008, Clinical pharmacology and therapeutics.

[38]  Olivier Bodenreider Issues in Mapping LOINC Laboratory Tests to SNOMED CT , 2008, AMIA.

[39]  John P. A. Ioannidis,et al.  Validating, augmenting and refining genome-wide association signals , 2009, Nature Reviews Genetics.

[40]  Francis Collins,et al.  Opportunities and challenges for the NIH--an interview with Francis Collins. Interview by Robert Steinbrook. , 2009, The New England journal of medicine.

[41]  D. Postma,et al.  Sequence variants affecting eosinophil numbers associate with asthma and myocardial infarction , 2009, Nature Genetics.

[42]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[43]  R. Steinbrook,et al.  Opportunities and Challenges for the NIH — An Interview with Francis Collins , 2009 .

[44]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[45]  Julie Bynum,et al.  Regional variations in diagnostic practices. , 2010, The New England journal of medicine.

[46]  Christopher G Chute,et al.  A high throughput semantic concept frequency based approach for patient identification: a case study using type 2 diabetes mellitus clinical notes. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[47]  Munir Pirmohamed,et al.  Pharmacogenomics: the importance of accurate phenotypes. , 2010, Pharmacogenomics.

[48]  P. Bork,et al.  A side effect resource to capture phenotypic effects of drugs , 2010, Molecular systems biology.

[49]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[50]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[51]  H. Sørensen,et al.  The Nordic countries as a cohort for pharmacoepidemiological research. , 2010, Basic & clinical pharmacology & toxicology.

[52]  Melissa A. Basford,et al.  Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. , 2010, American journal of human genetics.

[53]  J V Tu,et al.  Myocardial infarction and the validation of physician billing and hospitalization data using electronic medical records. , 2010, Chronic diseases in Canada.

[54]  K. Mandl,et al.  Patients treated at multiple acute health care facilities: quantifying information fragmentation. , 2010, Archives of internal medicine.

[55]  Kitty S. Chan,et al.  Review: Electronic Health Records and the Reliability and Validity of Quality Measures: A Review of the Literature , 2010, Medical care research and review : MCRR.

[56]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[57]  Christopher G. Chute,et al.  A Genome-Wide Association Study of Red Blood Cell Traits Using the Electronic Medical Record , 2010, PloS one.

[58]  I. Kohane,et al.  Electronic medical records for discovery research in rheumatoid arthritis , 2010, Arthritis care & research.

[59]  Wendy A. Wolf,et al.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies , 2011, BMC Medical Genomics.

[60]  D. Roden,et al.  The Emerging Role of Electronic Medical Records in Pharmacogenomics , 2011, Clinical pharmacology and therapeutics.

[61]  E. Clayton,et al.  Identifiability in biobanks: models, measures, and mitigation strategies , 2011, Human Genetics.

[62]  Christopher G Chute,et al.  Complement receptor 1 gene variants are associated with erythrocyte sedimentation rate. , 2011, American journal of human genetics.

[63]  Brown Wv Framingham Heart Study. , 2011, Journal of clinical lipidology.

[64]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[65]  H. Stefánsson,et al.  Identification of low-frequency variants associated with gout and serum uric acid levels , 2011, Nature Genetics.

[66]  Melissa A. Basford,et al.  Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. , 2011, American journal of human genetics.

[67]  Elaine R. Mardis,et al.  A decade’s perspective on DNA sequencing technology , 2011, Nature.

[68]  Yao Ju,et al.  HLA-B~*5701 screening for hypersensitivity to abacavir , 2011 .

[69]  C. Chute,et al.  Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium , 2011, Science Translational Medicine.

[70]  Peter Szolovits,et al.  Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. , 2011, American journal of human genetics.

[71]  Lucila Ohno-Machado,et al.  Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[72]  I. Kohane Using electronic health records to drive discovery in disease genomics , 2011, Nature Reviews Genetics.

[73]  R. Altman,et al.  Detecting Drug Interactions From Adverse‐Event Reports: Interaction Between Paroxetine and Pravastatin Increases Blood Glucose Levels , 2011, Clinical pharmacology and therapeutics.

[74]  Melissa A. Basford,et al.  Predicting warfarin dosage in European-Americans and African-Americans using DNA samples linked to an electronic health record. , 2012, Pharmacogenomics.

[75]  Hua Xu,et al.  Portability of an algorithm to identify rheumatoid arthritis in electronic health records , 2012, J. Am. Medical Informatics Assoc..

[76]  Weiqi Wei,et al.  The Impact of Data Fragmentation on High-Throughput Clinical Phenotyping. , 2012 .

[77]  Suzette J. Bielinski,et al.  Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study , 2012, J. Am. Medical Informatics Assoc..

[78]  E. Clayton,et al.  Operational Implementation of Prospective Genotyping for Personalized Medicine: The Design of the Vanderbilt PREDICT Project , 2012, Clinical pharmacology and therapeutics.

[79]  Leslie G Biesecker,et al.  Next‐generation sequencing demands next‐generation phenotyping , 2012, Human mutation.

[80]  Marylyn D. Ritchie,et al.  The use of a DNA biobank linked to electronic medical records to characterize pharmacogenomic predictors of tacrolimus dose requirement in kidney transplant recipients , 2012, Pharmacogenetics and genomics.

[81]  D M Roden,et al.  Electronic Medical Records as a Tool in Clinical Pharmacology: Opportunities and Challenges , 2012, Clinical pharmacology and therapeutics.

[82]  G. Henderson,et al.  Characterizing biobank organizations in the U.S.: results from a national survey , 2013, Genome Medicine.

[83]  M R Wilkinson,et al.  A Clinician‐Driven Automated System for Integration of Pharmacogenetic Interpretations Into an Electronic Medical Record , 2012, Clinical pharmacology and therapeutics.

[84]  P. Visscher,et al.  Five years of GWAS discovery. , 2012, American journal of human genetics.

[85]  Joshua C. Denny,et al.  Chapter 13: Mining Electronic Health Records in the Genomics Era , 2012, PLoS Comput. Biol..

[86]  Pedro J. Caraballo,et al.  Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus , 2012, J. Am. Medical Informatics Assoc..

[87]  Joshua C. Denny,et al.  An Evaluation of the NQF Quality Data Model for Representing Electronic Health Record Driven Phenotyping Algorithms , 2012, AMIA.

[88]  D. Roden,et al.  Predicting Clopidogrel Response Using DNA Samples Linked to an Electronic Health Record , 2012, Clinical pharmacology and therapeutics.

[89]  Christopher G. Chute,et al.  Erratum to "Cross-terminology mapping challenges: A demonstration using medication terminological systems" [J. Biomed. Inform. (2012) 613-625] , 2012, J. Biomed. Informatics.

[90]  Lisa Bastarache,et al.  Development of an ensemble resource linking MEDications to their Indications (MEDI). , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[91]  T. Kottke,et al.  An Algorithm That Identifies Coronary and Heart Failure Events In The Electronic Health Record , 2013, Preventing chronic disease.

[92]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[93]  D. Gudbjartsson,et al.  A common variant at 8q24.21 is associated with renal cell cancer , 2013, Nature Communications.

[94]  David J. Carey,et al.  PS2-7: Automated Ordering and Sample Collection to Leverage Electronic Medical Record Based Genetic Research – The Geisinger MyCode Project , 2013, Clinical Medicine & Research.

[95]  Jianxin Shi,et al.  Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs , 2013, Nature Genetics.

[96]  Xiaofeng Zhu,et al.  A Meta-Analysis Identifies New Loci Associated with Body Mass index in Individuals of African Ancestry , 2013, Nature Genetics.

[97]  Marylyn D. Ritchie,et al.  Phenome-Wide Association Study (PheWAS) for Detection of Pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network , 2013, PLoS genetics.

[98]  George Hripcsak,et al.  Caveats for the use of operational electronic health record data in comparative effectiveness research. , 2013, Medical care.

[99]  Melissa A. Basford,et al.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data , 2013, Nature Biotechnology.

[100]  Keith Marsolo,et al.  Clinical genomics in the world of the electronic health record , 2013, Genetics in Medicine.

[101]  Martijn J Schuemie,et al.  Identification of acute myocardial infarction from electronic healthcare records using different disease coding systems: a validation study in three European countries , 2013, BMJ Open.

[102]  Ruth Nussinov,et al.  Structure and dynamics of molecular networks: A novel paradigm of drug discovery. A comprehensive review , 2012, Pharmacology & therapeutics.

[103]  Christopher G. Chute,et al.  The absence of longitudinal data limits the accuracy of high-throughput clinical phenotyping for identifying type 2 diabetes mellitus subjects , 2013, Int. J. Medical Informatics.

[104]  Joshua C. Denny,et al.  Validation and Enhancement of a Computable Medication Indication Resource (MEDI) Using a Large Practice-based Dataset , 2013, AMIA.

[105]  M. Brilliant,et al.  A PheWAS approach in studying HLA-DRB1*1501 , 2013, Genes and Immunity.

[106]  K. Boycott,et al.  Rare-disease genetics in the era of next-generation sequencing: discovery to translation , 2013, Nature Reviews Genetics.

[107]  Anna Rumshisky,et al.  Evaluating temporal relations in clinical text: 2012 i2b2 Challenge , 2013, J. Am. Medical Informatics Assoc..

[108]  Tanya M. Teslovich,et al.  Discovery and refinement of loci associated with lipid levels , 2013, Nature Genetics.

[109]  Hua Xu,et al.  Development and evaluation of an ensemble resource linking medications to their indications , 2013, J. Am. Medical Informatics Assoc..

[110]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[111]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[112]  Christopher G. Chute,et al.  A genome- and phenome-wide association study to identify genetic variants influencing platelet count and volume and their pleiotropic effects , 2013, Human Genetics.

[113]  George Hripcsak,et al.  A collaborative approach to developing an electronic health record phenotyping algorithm for drug-induced liver injury. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[114]  Faisal M. Fadlelmola,et al.  Enabling Genomic Revolution in Africa , 2019, The Genetics of African Populations in Health and Disease.

[115]  D. Roden,et al.  Whole-exome sequencing in familial atrial fibrillation. , 2014, European heart journal.

[116]  M. McCarthy,et al.  Research Capacity: Enabling African Scientists to Engage Fully in the Genomic Revolution , 2014 .

[117]  Nicholas R. Hardiker,et al.  Inter-terminology mapping of nursing problems , 2014, J. Biomed. Informatics.

[118]  R L Berg,et al.  Characterization of Statin Dose Response in Electronic Medical Records , 2014, Clinical pharmacology and therapeutics.

[119]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[120]  B. Gage,et al.  Genotype and risk of major bleeding during warfarin treatment. , 2014, Pharmacogenomics.

[121]  E. Mardis,et al.  Prioritizing targets for precision cancer medicine. , 2014, Annals of oncology : official journal of the European Society for Medical Oncology.

[122]  Keith Marsolo,et al.  Phenome-wide association study (PheWAS) in EMR-linked pediatric cohorts, genetically links PLCL1 to speech language development and IL5-IL13 to Eosinophilic Esophagitis , 2014, Front. Genet..

[123]  Serguei V. S. Pakhomov,et al.  A sense inventory for clinical abbreviations and acronyms created using clinical notes and medical dictionary resources , 2014, J. Am. Medical Informatics Assoc..

[124]  Suzette J. Bielinski,et al.  Phenome-wide association studies demonstrating pleiotropy of genetic variants within FTO with and without adjustment for body mass index , 2014, Front. Genet..

[125]  Finale Doshi-Velez,et al.  Comorbidity Clusters in Autism Spectrum Disorders: An Electronic Health Record Time-Series Analysis , 2014, Pediatrics.

[126]  Suzette J. Bielinski,et al.  Design and Anticipated Outcomes of the eMERGE-PGx Project: A Multi-Center Pilot for Pre-Emptive Pharmacogenomics in Electronic Health Record Systems , 2014, Clinical pharmacology and therapeutics.

[127]  Matti Pirinen,et al.  Pharmacogenetic meta-analysis of genome-wide association studies of LDL cholesterol response to statins , 2014, Nature Communications.

[128]  Aniwaa Owusu Obeng,et al.  Clinical pharmacogenetics implementation: Approaches, successes, and challenges , 2014, American journal of medical genetics. Part C, Seminars in medical genetics.

[129]  Tanya M. Teslovich,et al.  Sequence variants in SLC16A11 are a common risk factor for type 2 diabetes in Mexico , 2013, Nature.

[130]  D. Roden,et al.  Biobanks and Electronic Medical Records: Enabling Cost-Effective Research , 2014, Science Translational Medicine.

[131]  Natalia Beloff,et al.  Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface , 2013, J. Am. Medical Informatics Assoc..

[132]  Adam J. Schwarz,et al.  CNVs conferring risk of autism or schizophrenia affect cognition in controls , 2013, Nature.

[133]  J. Haines,et al.  eMERGEing progress in genomics—the first seven years , 2014, Front. Genet..

[134]  Vivian S. Gainer,et al.  Evaluation of matched control algorithms in EHR-based phenotyping studies: A case study of inflammatory bowel disease comorbidities , 2014, J. Biomed. Informatics.

[135]  Joshua C. Denny,et al.  Creation and Validation of an EMR-based Algorithm for Identifying Major Adverse Cardiac Events while on Statins , 2014, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[136]  Nikhil Bhat,et al.  Million Veteran Program , 2015 .

[137]  D. Roden,et al.  A genome-wide association study of heparin-induced thrombocyto - penia using an electronic medical record , 2014, Thrombosis and Haemostasis.

[138]  Zhan Ye,et al.  Phenome-wide association studies (PheWASs) for functional variants , 2014, European Journal of Human Genetics.

[139]  Ying Li,et al.  Validating drug repurposing signals using electronic health records: a case study of metformin associated with reduced cancer mortality , 2014, J. Am. Medical Informatics Assoc..

[140]  Paul M. Matthews,et al.  The UK Biobank. , 2015, Brain : a journal of neurology.

[141]  Cosmin Adrian Bejan,et al.  Assessing the role of a medication-indication resource in the treatment relation extraction from clinical text , 2015, J. Am. Medical Informatics Assoc..

[142]  R J Carroll,et al.  Genetic variation in the HLA region is associated with susceptibility to herpes zoster , 2014, Genes and Immunity.