Building a Semantic Health Data Warehouse in the Context of Clinical Trials: Development and Usability Study

Background The huge amount of clinical, administrative, and demographic data recorded and maintained by hospitals can be consistently aggregated into health data warehouses with a uniform data model. In 2017, Rouen University Hospital (RUH) initiated the design of a semantic health data warehouse enabling both semantic description and retrieval of health information. Objective This study aimed to present a proof of concept of this semantic health data warehouse, based on the data of 250,000 patients from RUH, and to assess its ability to assist health professionals in prescreening eligible patients in a clinical trials context. Methods The semantic health data warehouse relies on 3 distinct semantic layers: (1) a terminology and ontology portal, (2) a semantic annotator, and (3) a semantic search engine and NoSQL (not only structured query language) layer to enhance data access performances. The system adopts an entity-centered vision that provides generic search capabilities able to express data requirements in terms of the whole set of interconnected conceptual entities that compose health information. Results We assessed the ability of the system to assist the search for 95 inclusion and exclusion criteria originating from 5 randomly chosen clinical trials from RUH. The system succeeded in fully automating 39% (29/74) of the criteria and was efficiently used as a prescreening tool for 73% (54/74) of them. Furthermore, the targeted sources of information and the search engine–related or data-related limitations that could explain the results for each criterion were also observed. Conclusions The entity-centered vision contrasts with the usual patient-centered vision adopted by existing systems. It enables more genericity in the information retrieval process. It also allows to fully exploit the semantic description of health information. Despite their semantic annotation, searching within clinical narratives remained the major challenge of the system. A finer annotation of the clinical texts and the addition of specific functionalities would significantly improve the results. The semantic aspect of the system combined with its generic entity-centered vision enables the processing of a large range of clinical questions. However, an important part of health information remains in clinical narratives, and we are currently investigating novel approaches (deep learning) to enhance the semantic annotation of those unstructured data.

[1]  Patrice Degoulet,et al.  Methodology of integration of a clinical data warehouse with a clinical information system: the HEGP case , 2010, MedInfo.

[2]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[3]  Stephen B. Johnson,et al.  A review of approaches to identifying patient phenotype cohorts using electronic health records , 2013, J. Am. Medical Informatics Assoc..

[4]  Julien Grosjean,et al.  Retrieving Clinical and Omic Data from Electronic Health Records. , 2016, Studies in health technology and informatics.

[5]  Romain Lelong Semantic Search Engine to Query into Electronic Health Records with a Multiple-Layer Query Language , 2016 .

[6]  Natalya F. Noy,et al.  BioPortal: Ontologies and Integrated Data Resources at the Click of a Mouse , 2009 .

[7]  Michael Müller,et al.  Using Electronic Health Records to Build an Ophthalmologic Data Warehouse and Visualize Patients' Data. , 2017, American journal of ophthalmology.

[8]  Hongfang Liu,et al.  Journal of Biomedical Informatics , 2022 .

[9]  André Happe,et al.  Roogle: An Information Retrieval Engine for Clinical Data Warehouse , 2011, MIE.

[10]  Anita Burgun-Parenthoine,et al.  A clinician friendly data warehouse oriented toward narrative reports: Dr. Warehouse , 2018, J. Biomed. Informatics.

[11]  Lina Fatima Soualmia,et al.  NoSQL technology in order to support Semantic Health Search Engine , 2018 .

[12]  Julien Grosjean,et al.  Evaluation of the Terminology Coverage in the French Corpus LiSSa. , 2017, Studies in health technology and informatics.

[13]  Jun Gao,et al.  DW4TR: A Data Warehouse for Translational Research , 2011, J. Biomed. Informatics.

[14]  Pierre Zweigenbaum,et al.  Clinical Natural Language Processing in languages other than English: opportunities and challenges , 2018, Journal of Biomedical Semantics.

[15]  Julien Grosjean,et al.  Health multi-terminology portal: a semantic added-value for patient safety. , 2011, Studies in health technology and informatics.

[16]  Isaac S. Kohane,et al.  Integration of Clinical and Genetic Data in the i2b2 Architecture , 2006, AMIA.

[17]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[18]  Matthew D. Krasowski,et al.  Use of a data warehouse at an academic medical center for clinical pathology quality improvement, education, and research , 2015, Journal of pathology informatics.

[19]  Marc Cuggia,et al.  Semantic integration of medication data into the EHOP Clinical Data Warehouse , 2015, MIE.

[20]  Stéfan Jacques Darmoni,et al.  Language Resources for French in the Biomedical Domain , 2014, LREC.

[21]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[22]  Julien Grosjean,et al.  Modélisation, réalisation et évaluation d'un portail multi-terminologique multi-discipline, multi-lingue (3M) dans le cadre de la Plateforme d'Indexation Régionale (PlaIR) , 2014 .

[23]  Kali VanLangen,et al.  Trends in electronic health record usage among US colleges of pharmacy. , 2018, Currents in pharmacy teaching & learning.

[24]  Son Doan,et al.  Natural Language Processing in Biomedicine: A Unified System Architecture Overview , 2014, Methods in molecular biology.

[25]  Alain Livartowski,et al.  ConSoRe : un outil permettant de rentrer dans le monde du big data en santé , 2016 .

[26]  Susan C. Weber,et al.  STRIDE - An Integrated Standards-Based Translational Research Informatics Platform , 2009, AMIA.

[27]  Lina Fatima Soualmia,et al.  SIBM at CLEF e-Health Evaluation Lab 2015 , 2015, CLEF.

[28]  Kai Zheng,et al.  Supporting information retrieval from electronic health records: A report of University of Michigan's nine-year experience in developing and using the Electronic Medical Record Search Engine (EMERSE) , 2015, J. Biomed. Informatics.

[29]  Eric Fosler-Lussier,et al.  How essential are unstructured clinical narratives and information fusion to clinical trial recruitment? , 2014, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[30]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[31]  Paul E. Johnson,et al.  Impact of Electronic Health Record Clinical Decision Support on Diabetes Care: A Randomized Trial , 2011, The Annals of Family Medicine.

[32]  Lina Fatima Soualmia,et al.  SIBM at CLEF eHealth Evaluation Lab 2016: Extracting Concepts in French Medical Texts with ECMT and CIMIND , 2016, CLEF.

[33]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[34]  Griffin M. Weber,et al.  Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2) , 2010, J. Am. Medical Informatics Assoc..

[35]  Nicolas Griffon,et al.  Querying EHRs with a Semantic and Entity-Oriented Query Language. , 2017, Studies in health technology and informatics.

[36]  Abdulrab Habib,et al.  A search tool based on 'encapsulated' MeSH thesaurus to retrieve quality health resources on the Internet , 2001, Medical informatics and the Internet in medicine.

[37]  Abhishek Pandey,et al.  Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review , 2017, J. Biomed. Informatics.

[38]  Christopher G. Chute,et al.  The Enterprise Data Trust at Mayo Clinic: a semantically integrated warehouse of biomedical data , 2010, J. Am. Medical Informatics Assoc..