Ensembles of natural language processing systems for portable phenotyping solutions

BACKGROUND Manually curating standardized phenotypic concepts such as Human Phenotype Ontology (HPO) terms from narrative text in electronic health records (EHRs) is time consuming and error prone. Natural language processing (NLP) techniques can facilitate automated phenotype extraction and thus improve the efficiency of curating clinical phenotypes from clinical texts. While individual NLP systems can perform well for a single cohort, an ensemble-based method might shed light on increasing the portability of NLP pipelines across different cohorts. METHODS We compared four NLP systems, MetaMapLite, MedLEE, ClinPhen and cTAKES, and four ensemble techniques, including intersection, union, majority-voting and machine learning, for extracting generic phenotypic concepts. We addressed two important research questions regarding automated phenotype recognition. First, we evaluated the performance of different approaches in identifying generic phenotypic concepts. Second, we compared the performance of different methods to identify patient-specific phenotypic concepts. To better quantify the effects caused by concept granularity differences on performance, we developed a novel evaluation metric that considered concept hierarchies and frequencies. Each of the approaches was evaluated on a gold standard set of clinical documents annotated by clinical experts. One dataset containing 1,609 concepts derived from 50 clinical notes from two different institutions was used in both evaluations, and an additional dataset of 608 concepts derived from 50 case report abstracts obtained from PubMed was used for evaluation of identifying generic phenotypic concepts only. RESULTS For generic phenotypic concept recognition, the top three performers in the NYP/CUIMC dataset are union ensemble (F1, 0.643), training-based ensemble (F1, 0.632), and majority vote-based ensemble (F1, 0.622). In the Mayo dataset, the top three are majority vote-based ensemble (F1, 0.642), cTAKES (F1, 0.615), and MedLEE (F1, 0.559). In the PubMed dataset, the top three are majority vote-based ensemble (F1, 0.719), training-based (F1, 0.696) and MetaMapLite (F1, 0.694). For identifying patient specific phenotypes, the top three performers in the NYP/CUIMC dataset are majority vote-based ensemble (F1, 0.610), MedLEE (F1, 0.609), and training-based ensemble (F1, 0.585). In the Mayo dataset, the top three are majority vote-based ensemble (F1, 0.604), cTAKES (F1, 0.531) and MedLEE (F1, 0.527). CONCLUSIONS Our study demonstrates that ensembles of natural language processing can improve both generic phenotypic concept recognition and patient specific phenotypic concept identification over individual systems. Among the individual NLP systems, each individual system performed best when they were applied in the dataset that they were primary designed for. However, combining multiple NLP systems to create an ensemble can generally improve the performance. Specifically, the ensemble can increase the results reproducibility across different cohorts and tasks, and thus provide a more portable phenotyping solution compared to individual NLP systems.

[1]  Dina Demner-Fushman,et al.  MetaMap Lite: an evaluation of a new Java implementation of MetaMap , 2017, J. Am. Medical Informatics Assoc..

[2]  James J. Masanz,et al.  Negation’s Not Solved: Generalizability Versus Optimizability in Clinical Natural Language Processing , 2014, PloS one.

[3]  George Hripcsak,et al.  Deep Phenotyping on Electronic Health Records Facilitates Genetic Diagnosis by Clinical Exomes. , 2018, American journal of human genetics.

[4]  Frank D. Wood,et al.  Diagnosis code assignment: models and evaluation metrics , 2013, J. Am. Medical Informatics Assoc..

[5]  Damian Smedley,et al.  Improved exome prioritization of disease genes through cross-species phenotype comparison , 2014, Genome research.

[6]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[7]  Olga Patterson,et al.  Document clustering of clinical narratives: a systematic study of clinical sublanguages. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[8]  Hongfang Liu,et al.  Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions , 2017, J. Am. Medical Informatics Assoc..

[9]  Hongfang Liu,et al.  Using machine learning for concept extraction on clinical documents from multiple data sources , 2011, J. Am. Medical Informatics Assoc..

[10]  Allyn McConkie-Rosell,et al.  The utility of the traditional medical genetics diagnostic evaluation in the context of next-generation sequencing for undiagnosed genetic disorders , 2013, Genetics in Medicine.

[11]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[12]  Franck Dernoncourt,et al.  Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives , 2018, PloS one.

[13]  J Starren,et al.  Architectural requirements for a multipurpose natural language processor in the clinical environment. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[14]  Nigel Collier,et al.  Automatic concept recognition using the Human Phenotype Ontology reference and test suite corpora , 2015, Database J. Biol. Databases Curation.

[15]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[16]  Siddhartha R. Jonnalagadda,et al.  Feasibility of pooling annotated corpora for clinical concept extraction , 2012, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[17]  Hui Yang,et al.  Phenolyzer: phenotype-based prioritization of candidate genes for human diseases , 2015, Nature Methods.

[18]  W. Chung,et al.  Clinical application of whole-exome sequencing across clinical indications , 2015, Genetics in Medicine.

[19]  Son Doan,et al.  Recognition of medication information from discharge summaries using ensembles of classifiers , 2012, BMC Medical Informatics and Decision Making.

[20]  Tudor Groza,et al.  Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources , 2018, Nucleic Acids Res..

[21]  Wendy W. Chapman,et al.  ConText: An algorithm for determining negation, experiencer, and temporal status from clinical reports , 2009, J. Biomed. Informatics.

[22]  C. Lee,et al.  Medical big data: promise and challenges , 2017, Kidney research and clinical practice.

[23]  Georgi Z. Genchev,et al.  Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction. , 2017, Methods.

[24]  François Schiettecatte,et al.  OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders , 2014, Nucleic Acids Res..

[25]  Brett J. Kennedy,et al.  Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. , 2014, American journal of human genetics.

[26]  J. Shendure,et al.  Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data , 2011, Nature Reviews Genetics.

[27]  Chunhua Weng,et al.  Doc2Hpo: a web application for efficient and accurate HPO concept curation , 2019, Nucleic Acids Res..

[28]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[29]  Tao Wang,et al.  High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI) , 2016, BMC Systems Biology.

[30]  Hongfang Liu,et al.  BioTagger-GM: a gene/protein name recognition system. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[31]  Evan O. Paull,et al.  Inferring causal molecular networks: empirical assessment through a community-based effort , 2016, Nature Methods.

[32]  Gill Bejerano,et al.  ClinPhen extracts and prioritizes patient phenotypes directly from medical records to expedite genetic disease diagnosis , 2018, Genetics in Medicine.

[33]  Jingcheng Du,et al.  A study of deep learning approaches for medication and adverse drug event extraction from clinical text , 2019, J. Am. Medical Informatics Assoc..

[34]  George Hripcsak,et al.  Automated encoding of clinical documents based on natural language processing. , 2004, Journal of the American Medical Informatics Association : JAMIA.

[35]  Rashmi Prasad,et al.  Part-of-speech tagging for clinical text: wall or bridge between institutions? , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[36]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[37]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[38]  Kin Keung Lai,et al.  Credit risk assessment with a multistage neural network ensemble learning approach , 2008, Expert Syst. Appl..

[39]  Girish Chavan,et al.  NOBLE – Flexible concept recognition for large-scale biomedical natural language processing , 2016, BMC Bioinformatics.

[40]  B. Fernandez,et al.  Utility of whole‐exome sequencing for those near the end of the diagnostic odyssey: time to address gaps in care , 2015, Clinical genetics.

[41]  Olga Patterson,et al.  Document sublanguage clustering to detect medical specialty in cross-institutional clinical texts , 2013, DTMBIO '13.

[42]  S. South,et al.  American College of Medical Genetics standards and guidelines for interpretation and reporting of postnatal constitutional copy number variants , 2011, Genetics in Medicine.

[43]  Ren Long,et al.  iRSpot-EL: identify recombination spots with an ensemble learning approach , 2017, Bioinform..

[44]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[45]  Erik M. van Mulligen,et al.  Using an ensemble system to improve concept extraction from clinical records , 2012, J. Biomed. Informatics.

[46]  Xiaoying Wang,et al.  Protein-protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique , 2018, Bioinform..