PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability

OBJECTIVE Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites. RESULTS As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%). DISCUSSION These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others. CONCLUSION By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data.

[1]  Joshua C. Denny,et al.  An Evaluation of the NQF Quality Data Model for Representing Electronic Health Record Driven Phenotyping Algorithms , 2012, AMIA.

[2]  Andrew Hayen,et al.  Integrating electronic health record information to support integrated care: Practical application of ontologies to improve the accuracy of diabetes disease registers , 2014, J. Biomed. Informatics.

[3]  Eric S. Kirkendall,et al.  Phenotyping for patient safety: algorithm development for electronic health record based automated adverse event and medical error detection in neonatal intensive care , 2014, Journal of the American Medical Informatics Association : JAMIA.

[4]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[5]  C. Chute,et al.  Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium , 2011, Science Translational Medicine.

[6]  Jeanmarie Mayer,et al.  Implementing Automated Surveillance for Tracking Clostridium difficile Infection at Multiple Healthcare Facilities , 2012, Infection Control & Hospital Epidemiology.

[7]  Jyotishman Pathak,et al.  ePhenotyping for Abdominal Aortic Aneurysm in the Electronic Medical Records and Genomics (eMERGE) Network: Algorithm Development and Konstanz Information Miner Workflow , 2015, International journal of biomedical data mining.

[8]  William K. Thompson,et al.  Anatomic and Advanced Adenoma Detection Rates as Quality Metrics Determined via Natural Language Processing , 2014, The American Journal of Gastroenterology.

[9]  Keith Marsolo,et al.  EMR-linked GWAS study: investigation of variation landscape of loci for body mass index in children , 2013, Front. Genet..

[10]  Marc B. Rosenman,et al.  Database queries for hospitalizations for acute congestive heart failure: flexible methods and validation based on set theory , 2014, J. Am. Medical Informatics Assoc..

[11]  Lin Chen,et al.  Importance of multi-modal approaches to effectively identify cataract cases from electronic health records , 2012, J. Am. Medical Informatics Assoc..

[12]  Melissa A. Basford,et al.  Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. , 2011, American journal of human genetics.

[13]  William K. Thompson,et al.  High Density GWAS for LDL Cholesterol in African Americans Using Electronic Medical Records Reveals a Strong Protective Variant in APOE , 2012, Clinical and translational science.

[14]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[15]  Richard L Skolasky,et al.  A comparative effectiveness trial of postoperative management for lumbar spine surgery: changing behavior through physical therapy (CBPT) study protocol , 2014, BMC Musculoskeletal Disorders.

[16]  Melissa A. Basford,et al.  Genome- and Phenome-Wide Analyses of Cardiac Conduction Identifies Markers of Arrhythmia Risk , 2013, Circulation.

[17]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[18]  Christopher G Chute,et al.  Analyzing the heterogeneity and complexity of Electronic Health Record oriented phenotyping algorithms. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[19]  Michael J. Keiser,et al.  Systems pharmacology augments drug safety surveillance , 2014, Clinical pharmacology and therapeutics.

[20]  Joshua C. Denny,et al.  Chapter 13: Mining Electronic Health Records in the Genomics Era , 2012, PLoS Comput. Biol..

[21]  Jennifer G. Robinson,et al.  Electronic health records based phenotyping in next-generation clinical trials: a perspective from the NIH Health Care Systems Collaboratory. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[22]  David Sontag,et al.  Using Anchors to Estimate Clinical State without Labeled Data , 2014, AMIA.

[23]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[24]  Masato Kimura,et al.  NCBI’s Database of Genotypes and Phenotypes: dbGaP , 2013, Nucleic Acids Res..

[25]  J. Denny,et al.  Extracting research-quality phenotypes from electronic health records to support precision medicine , 2015, Genome Medicine.

[26]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[27]  Hua Xu,et al.  Portability of an algorithm to identify rheumatoid arthritis in electronic health records , 2012, J. Am. Medical Informatics Assoc..

[28]  Marylyn D. Ritchie,et al.  Electronic medical records and genomics (eMERGE) network exploration in cataract: Several new potential susceptibility loci , 2014, Molecular vision.

[29]  Eneida A. Mendonça,et al.  Relational machine learning for electronic health record-driven phenotyping , 2014, J. Biomed. Informatics.

[30]  Suzette J. Bielinski,et al.  Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study , 2012, J. Am. Medical Informatics Assoc..

[31]  Peter Szolovits,et al.  Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources , 2015, J. Am. Medical Informatics Assoc..

[32]  Nigam H. Shah,et al.  Proton Pump Inhibitor Usage and the Risk of Myocardial Infarction in the General Population , 2015, PloS one.

[33]  E Losina,et al.  Development and validation of a computer-based algorithm to identify foreign-born patients with HIV infection from the electronic medical record , 2014, Applied Clinical Informatics.

[34]  N. Shah Mining the ultimate phenome repository , 2013, Nature Biotechnology.

[35]  Gerard Tromp,et al.  Design patterns for the development of electronic health record-driven phenotype extraction algorithms , 2014, J. Biomed. Informatics.

[36]  G Tromp,et al.  A Rigorous Algorithm To Detect And Clean Inaccurate Adult Height Records Within EHR Systems , 2014, Applied Clinical Informatics.

[37]  Shelley A. Rusincovitch,et al.  A comparison of phenotype definitions for diabetes mellitus. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[38]  Christopher G Chute,et al.  The SHARPn project on secondary use of Electronic Medical Record data: progress, plans, and possibilities. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[39]  Richard Platt,et al.  Launching PCORnet, a national patient-centered clinical research network , 2014, Journal of the American Medical Informatics Association : JAMIA.

[40]  Paolo Paolini,et al.  Model-driven development of Web applications: the AutoWeb system , 2000, TOIS.

[41]  A. Mathai,et al.  Understanding and using sensitivity, specificity and predictive values , 2008, Indian journal of ophthalmology.

[42]  Melissa A. Basford,et al.  Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. , 2010, American journal of human genetics.

[43]  D M Roden,et al.  Electronic Medical Records as a Tool in Clinical Pharmacology: Opportunities and Challenges , 2012, Clinical pharmacology and therapeutics.

[44]  George Hripcsak,et al.  Development and validation of a classification approach for extracting severity automatically from electronic health records , 2015, Journal of Biomedical Semantics.

[45]  Melissa A. Basford,et al.  The Electronic Medical Records and Genomics (eMERGE) Network: past, present, and future , 2013, Genetics in Medicine.

[46]  Christopher G Chute,et al.  Discovering peripheral arterial disease cases from radiology notes using natural language processing. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.