Making work visible for electronic phenotype implementation: Lessons learned from the eMERGE network

BACKGROUND Implementation of phenotype algorithms requires phenotype engineers to interpret human-readable algorithms and translate the description (text and flowcharts) into computable phenotypes - a process that can be labor intensive and error prone. To address the critical need for reducing the implementation efforts, it is important to develop portable algorithms. METHODS We conducted a retrospective analysis of phenotype algorithms developed in the Electronic Medical Records and Genomics (eMERGE) network and identified common customization tasks required for implementation. A novel scoring system was developed to quantify portability from three aspects: Knowledge conversion, clause Interpretation, and Programming (KIP). Tasks were grouped into twenty representative categories. Experienced phenotype engineers were asked to estimate the average time spent on each category and evaluate time saving enabled by a common data model (CDM), specifically the Observational Medical Outcomes Partnership (OMOP) model, for each category. RESULTS A total of 485 distinct clauses (phenotype criteria) were identified from 55 phenotype algorithms, corresponding to 1153 customization tasks. In addition to 25 non-phenotype-specific tasks, 46 tasks are related to interpretation, 613 tasks are related to knowledge conversion, and 469 tasks are related to programming. A score between 0 and 2 (0 for easy, 1 for moderate, and 2 for difficult portability) is assigned for each aspect, yielding a total KIP score range of 0 to 6. The average clause-wise KIP score to reflect portability is 1.37±1.38. Specifically, the average knowledge (K) score is 0.64±0.66, interpretation (I) score is 0.33±0.55, and programming (P) score is 0.40±0.64. 5% of the categories can be completed within one hour (median). 70% of the categories take from days to months to complete. The OMOP model can assist with vocabulary mapping tasks. CONCLUSION This study presents firsthand knowledge of the substantial implementation efforts in phenotyping and introduces a novel metric (KIP) to measure portability of phenotype algorithms for quantifying such efforts across the eMERGE Network. Phenotype developers are encouraged to analyze and optimize the portability in regards to knowledge, interpretation and programming. CDMs can be used to improve the portability for some 'knowledge-oriented' tasks.

[1]  Jimeng Sun,et al.  Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods , 2016, Artif. Intell. Medicine.

[2]  Paul A. Harris,et al.  Desiderata for computable representations of electronic health records-driven phenotype algorithms , 2015, J. Am. Medical Informatics Assoc..

[3]  Melissa A. Basford,et al.  Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. , 2011, American journal of human genetics.

[4]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[5]  Olga Patterson,et al.  Document clustering of clinical narratives: a systematic study of clinical sublanguages. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[6]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[7]  Joshua C. Denny,et al.  A Prototype for Executable and Portable Electronic Clinical Quality Measures Using the KNIME Analytics Platform , 2015, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[8]  Christopher G. Chute,et al.  A Genome-Wide Association Study of Red Blood Cell Traits Using the Electronic Medical Record , 2010, PloS one.

[9]  Gerard Tromp,et al.  Design patterns for the development of electronic health record-driven phenotype extraction algorithms , 2014, J. Biomed. Informatics.

[10]  J. Haines,et al.  eMERGEing progress in genomics—the first seven years , 2014, Front. Genet..

[11]  Wendy A. Wolf,et al.  The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies , 2011, BMC Medical Genomics.

[12]  Randolph A. Miller,et al.  Research Paper: Evaluation of a Method to Identify and Categorize Section Headers in Clinical Documents , 2009, J. Am. Medical Informatics Assoc..

[13]  Matthias Samwald,et al.  The Arden Syntax standard for clinical decision support: Experiences and directions , 2012, J. Biomed. Informatics.

[14]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[15]  George Hripcsak,et al.  High-fidelity phenotyping: richness and freedom from bias , 2017, J. Am. Medical Informatics Assoc..

[16]  Joshua C. Denny,et al.  A case study evaluating the portability of an executable computable phenotype algorithm across multiple institutions and electronic health record environments , 2018, J. Am. Medical Informatics Assoc..

[17]  Tina Hernandez-Boussard,et al.  Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models. , 2018, Annual review of biomedical data science.

[18]  Fernando Alonso,et al.  Knowledge Engineering versus Software Engineering , 1990, Data Knowl. Eng..

[19]  Han-Na Kim,et al.  Pathway Analysis of Metabolic Syndrome Using a Genome-Wide Association Study of Korea Associated Resource (KARE) Cohorts , 2014, Genomics & Informatics.

[20]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[21]  Paul A. Harris,et al.  PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability , 2016, J. Am. Medical Informatics Assoc..

[22]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[23]  Christopher G Chute,et al.  Analyzing the heterogeneity and complexity of Electronic Health Record oriented phenotyping algorithms. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[24]  G Hripcsak,et al.  Writing Arden Syntax Medical Logic Modules. , 1994, Computers in biology and medicine.

[25]  J. Denny,et al.  Extracting research-quality phenotypes from electronic health records to support precision medicine , 2015, Genome Medicine.

[26]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[27]  Jin Fan,et al.  Leveraging informatics for genetic studies: use of the electronic medical record to enable a genome-wide association study of peripheral arterial disease , 2010, J. Am. Medical Informatics Assoc..

[28]  Suzette J. Bielinski,et al.  Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study , 2012, J. Am. Medical Informatics Assoc..

[29]  T A Pryor,et al.  Rationale for the Arden Syntax. , 1994, Computers and biomedical research, an international journal.

[30]  I. Kohane,et al.  Development of phenotype algorithms using electronic medical records and incorporating natural language processing , 2015, BMJ : British Medical Journal.

[31]  Anderson Spickard,et al.  Research Paper: "Understanding" Medical School Curriculum Content Using KnowledgeMap , 2003, J. Am. Medical Informatics Assoc..

[32]  Peng Jin,et al.  Criteria2Query: a natural language interface to clinical databases for cohort definition , 2019, J. Am. Medical Informatics Assoc..

[33]  Hua Xu,et al.  Portability of an algorithm to identify rheumatoid arthritis in electronic health records , 2012, J. Am. Medical Informatics Assoc..

[34]  Christopher G Chute,et al.  An Information Extraction Framework for Cohort Identification Using Electronic Health Records , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.