Feasibility of Using Clinical Element Models (CEM) to Standardize Phenotype Variables in the Database of Genotypes and Phenotypes (dbGaP)

The database of Genotypes and Phenotypes (dbGaP) contains various types of data generated in Genome Wide Association Studies (GWAS). These data can be used to facilitate novel scientific discovery and to reduce cost and time for exploratory research. However, idiosyncrasies in variable names become a major barrier for reusing these data. We studied the problem of formalizing the phenotype variable descriptions using Clinical Element Models (CEM). Direct mapping of 379 phenotype names to existing CEM yielded a low rate of exact matches (N=25). However, the flexible and expressive underlying information models of CEM provided a robust means of representing 115 phenotype variable descriptions, indicating that CEMs can be successfully applied to standardize a large portion of the clinical variables contained in dbGaP.

[1]  Masato Kimura,et al.  NCBI’s Database of Genotypes and Phenotypes: dbGaP , 2013, Nucleic Acids Res..

[2]  The Database of Genotypes and Phenotypes (dbGaP) and PheGenI , 2013 .

[3]  Cui Tao,et al.  A semantic-web oriented representation of the clinical element model for secondary use of electronic health records data , 2013, J. Am. Medical Informatics Assoc..

[4]  Hongfang Liu,et al.  A common type system for clinical natural language processing , 2013, J. Biomed. Semant..

[5]  Son Doan,et al.  Demographics Identification: Variable Extraction Resource (DIVER) , 2012, 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology.

[6]  Bethany Percha,et al.  Automatic classification of mammography reports by BI-RADS breast tissue composition class , 2012, J. Am. Medical Informatics Assoc..

[7]  Cui Tao,et al.  Building a robust, scalable and standards-driven infrastructure for secondary use of EHR data: The SHARPn project , 2012, J. Biomed. Informatics.

[8]  L. Ohno-Machado,et al.  Testing The Adequacy Of A Public GWAS Database As A Cohort Discovery Tool , 2012, ATS 2012.

[9]  Huaqin Pan,et al.  Using PhenX measures to identify opportunities for cross‐study analysis , 2012, Human mutation.

[10]  Luigi Iannone,et al.  Lexically suggest, logically define: Quality assurance of the use of qualifiers and expected results of post-coordination in SNOMED CT , 2012, J. Biomed. Informatics.

[11]  Wendy W. Chapman,et al.  Developing a natural language processing application for measuring the quality of colonoscopy procedures , 2011, J. Am. Medical Informatics Assoc..

[12]  Stijn Heymans,et al.  Semantic validation of the use of SNOMED CT in HL7 clinical documents , 2011, J. Biomed. Semant..

[13]  Christopher G. Chute,et al.  Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: the eMERGE Network experience , 2011, J. Am. Medical Informatics Assoc..

[14]  Christopher G. Chute,et al.  Evaluating Phenotypic Data Elements for Genetics and Epidemiological Research: Experiences from the eMERGE and PhenX Network Projects , 2011, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[15]  Christopher G Chute,et al.  The SHARPn project on secondary use of Electronic Medical Record data: progress, plans, and possibilities. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[16]  Christel Daniel-Le Bozec,et al.  Bridging the semantics gap between terminologies, ontologies, and information models , 2010, MedInfo.

[17]  David Carlson,et al.  A Model-Driven Approach for Biomedical Data Integration , 2010, MedInfo.

[18]  Daniel R. Luna,et al.  Implementing rules to improve the quality of concept post-coordination with SNOMED CT , 2010, MedInfo.

[19]  Daniel J. Vreeman,et al.  Logical Observation Identifiers Names and Codes (LOINC®) users' guide , 2010 .

[20]  Andrew D. Johnson,et al.  Bmc Medical Genetics an Open Access Database of Genome-wide Association Results , 2009 .

[21]  Paola Velardi,et al.  From Glossaries to Ontologies: Extracting Semantic Structure from Textual Definitions , 2008, Ontology Learning and Population.

[22]  Yan Z. Heras,et al.  Clinical Element Model , 2008 .

[23]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[24]  Rebecca S. Crowley,et al.  The CAP cancer protocols – a case study of caCORE based data standards implementation to integrate with the Cancer Biomedical Informatics Grid , 2006, BMC Medical Informatics Decis. Mak..

[25]  Loren Paul Rees,et al.  Technical Brief: Development and Evaluation of Methods for Structured Recording of Heart Murmur Findings Using SNOMED-CT® Post-Coordination , 2006, J. Am. Medical Informatics Assoc..

[26]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[27]  Michael Krauthammer,et al.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles , 2001, ISMB.

[28]  Alexa T. McCray,et al.  Research Paper: Evaluating the Coverage of Controlled Health Data Terminologies: Report on the Results of the NLM/AHCPR Large Scale Vocabulary Test , 1997, J. Am. Medical Informatics Assoc..

[29]  C. Chute,et al.  The content coverage of clinical classifications. For The Computer-Based Patient Record Institute's Work Group on Codes & Structures. , 1996, Journal of the American Medical Informatics Association : JAMIA.

[30]  W. DuMouchel,et al.  Unlocking Clinical Data from Narrative Reports: A Study of Natural Language Processing , 1995, Annals of Internal Medicine.