Restricted natural language based querying of clinical databases

PURPOSE To elevate the level of care to the community it is essential to provide usable tools for healthcare professionals to extract knowledge from clinical data. In this paper a generic translation algorithm is proposed to translate a restricted natural language query (RNLQ) to a standard query language like SQL (Structured Query Language). METHODS A special purpose clinical data analytics language (CliniDAL) has been introduced which provides scheme of six classes of clinical questioning templates. A translation algorithm is proposed to translate the RNLQ of users to SQL queries based on a similarity-based Top-k algorithm which is used in the mapping process of CliniDAL. Also a two layer rule-based method is used to interpret the temporal expressions of the query, based on the proposed temporal model. The mapping and translation algorithms are generic and thus able to work with clinical databases in three data design models, including Entity-Relationship (ER), Entity-Attribute-Value (EAV) and XML, however it is only implemented for ER and EAV design models in the current work. RESULTS It is easy to compose a RNLQ via CliniDAL's interface in which query terms are automatically mapped to the underlying data models of a Clinical Information System (CIS) with an accuracy of more than 84% and the temporal expressions of the query comprising absolute times, relative times or relative events can be automatically mapped to time entities of the underlying CIS and to normalized temporal comparative values. CONCLUSION The proposed solution of CliniDAL using the generic mapping and translation algorithms which is enhanced by a temporal analyzer component provides a simple mechanism for composing RNLQ for extracting knowledge from CISs with different data design models for analytics purposes.

[1]  Min Li,et al.  High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge , 2010, J. Am. Medical Informatics Assoc..

[2]  Chong Wang,et al.  PANTO: A Portable Natural Language Interface to Ontologies , 2007, ESWC.

[3]  Jon D. Patrick,et al.  An Automated System for Conversion of Clinical Notes into SNOMED Clinical Terminology , 2007, ACSW.

[4]  Cynthia Brandt,et al.  Pivoting approaches for bulk extraction of Entity-Attribute-Value data , 2006, Comput. Methods Programs Biomed..

[5]  Yeye He,et al.  Keyword++ , 2010, Proc. VLDB Endow..

[6]  Chong Wang,et al.  SPARK: Adapting Keyword Query to Semantic Search , 2007, ISWC/ASWC.

[7]  Jon D. Patrick,et al.  A temporal model for Clinical Data Analytics language , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[8]  Cui Tao,et al.  Comprehensive temporal information detection from clinical text: medical events, time, and TLINK identification , 2013, J. Am. Medical Informatics Assoc..

[9]  Sonia Bergamaschi,et al.  Keymantic: Semantic Keyword-based Searching in Data Integration Systems , 2010, Proc. VLDB Endow..

[10]  Jon Patrick,et al.  KNOWLEDGE DISCOVERY AND KNOWLEDGE REUSE IN CLINICAL INFORMATION SYSTEMS , 2013, BioMed 2013.

[11]  Michael Gertz,et al.  HeidelTime: High Quality Rule-Based Extraction and Normalization of Temporal Expressions , 2010, *SEMEVAL.

[12]  Hua Xu,et al.  A study of machine-learning-based approaches to extract clinical entities and their assertions from discharge summaries , 2011, J. Am. Medical Informatics Assoc..

[13]  Jon D. Patrick,et al.  Mapping query terms to data and schema using content based similarity search in clinical information systems , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[14]  Jon Patrick,et al.  An Active Learning Process for Extraction and Standardisation of Medical Measurements by a Trainable FSA , 2011, CICLing.

[15]  Perry L. Miller,et al.  Application of Information Technology: Organization of Heterogeneous Scientific Data Using the EAV/CR Representation , 1999, J. Am. Medical Informatics Assoc..

[16]  Iryna Gurevych,et al.  Towards Enhanced Interoperability for Large HLT Systems : UIMA for NLP , 2008 .

[17]  Cynthia Brandt,et al.  Application of Information Technology: Metadata-driven Ad Hoc Query of Patient Data: Meeting the Needs of Clinical Studies , 2002, J. Am. Medical Informatics Assoc..

[18]  George Hripcsak,et al.  System Architecture for Temporal Information Extraction, Representationand Reasoning in Clinical Narrative Reports , 2005, AMIA.