On Mapping Textual Queries to a Common Data Model

The widespread adoption of Electronic Health Records (EHRs) has enabled data-driven approaches to clinical care and research. However, the performance and generalizability of those approaches are severely hampered by the lack of syntactic and semantic interoperability of EHR data across institutions. Towards resolving this problem, Common Data Models (CDMs) can be used to standardize the clinical data in clinical data repositories. In this paper, we described our mapping of entity mention types from patient-level information retrieval queries to an empirical subset of Observational Medical Outcomes Partnership (OMOP) CDM data fields. We investigated the empirical data model by annotating multi-institutional clinical data requests in free text and comparing the distributions of data model fields. The similar distribution of the entity mention types from two different sites indicates that the data model is generalizable for multi-institutional cohort identification queries.

[1]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[2]  Hongfang Liu,et al.  Research and applications: MedXN: an open source medication extraction and normalization tool for clinical text , 2014, J. Am. Medical Informatics Assoc..

[3]  Ellen M. Voorhees,et al.  Overview of the TREC 2012 Medical Records Track , 2012, TREC.

[4]  S. Anderson,et al.  The FDA's sentinel initiative—A comprehensive approach to medical product surveillance , 2016, Clinical pharmacology and therapeutics.

[5]  Francis S. Collins,et al.  PCORnet: turning a dream into reality , 2014, J. Am. Medical Informatics Assoc..

[6]  Siddhartha Jonnalagadda,et al.  Towards a semantic lexicon for clinical natural language processing , 2012, AMIA.

[7]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[8]  Patrick B. Ryan,et al.  Multisite Evaluation of a Data Quality Tool for Patient-Level Clinical Data Sets , 2016, EGEMS.

[9]  Chunhua Weng,et al.  An Interoperable Similarity-based Cohort Identification Method Using the OMOP Common Data Model Version 5.0 , 2017, J. Heal. Informatics Res..

[10]  Nigam H. Shah,et al.  Learning statistical models of phenotypes using noisy labeled training data , 2016, J. Am. Medical Informatics Assoc..

[11]  Hongfang Liu,et al.  Aligned-Layer Text Search in Clinical Notes , 2020, MedInfo.

[12]  Hongfang Liu,et al.  Intrainstitutional EHR collections for patient‐level information retrieval , 2017, J. Assoc. Inf. Sci. Technol..

[13]  Jeffrey G Klann,et al.  Query Health: standards-based, cross-platform population health surveillance , 2014, J. Am. Medical Informatics Assoc..

[14]  K. E. Ravikumar,et al.  Automated chart review for asthma cohort identification using natural language processing: an exploratory study. , 2013, Annals of allergy, asthma & immunology : official publication of the American College of Allergy, Asthma, & Immunology.

[15]  Gareth J. F. Jones,et al.  ShARe/CLEF eHealth Evaluation Lab 2014, Task 3: User-centred Health Information Retrieval , 2014, CLEF.

[16]  Zhiyong Lu,et al.  Challenges in clinical natural language processing for automated disorder normalization , 2015, J. Biomed. Informatics.

[17]  Cui Tao,et al.  Correlating Lab Test Results in Clinical Notes with Structured Lab Data: A Case Study in HbA1c and Glucose , 2017, CRI.