Clinical questionnaire filling based on question answering framework

BACKGROUND Electronic Health Records (EHR) are the foundation of much medical research. However, analyzing the text data of EHRs directly is an challenging task. Therefore, physicians often use questionnaires to first convert text data to structured data. Filling in these questionnaires requires a considerable amount of time and medical knowledge. It is a significant task to develop an algorithm to make computers fill out these questionnaires automatically. OBJECTIVE This research aims to build a deep learning model that can automatically complete questionnaires with given medical text. METHODS This task is a part of Information Extraction (IE), but it differs from the existing tasks of medical IE. Because of the questions in questionnaires are closed-end type, which refers to making a selection among given options, we could treat this task as a classification problem. However, conventional classification algorithms are resource-consuming when filling out one entire questionnaire with one model. They also could not use the question information to guide the questionnaire filling task. To handle these issues, we propose a neural network model based on question answering (QA) framework in this paper. With this framework, our neural network can fill out one complete questionnaire using only one model. RESULTS We perform experiments on three real-world Chinese medical datasets and related clinical questionnaires. Our model respectively achieves F1 scores of 0.9273, 0.8834, and 0.9846. The results outperform several baseline models. CONCLUSION The strong performance of our QA model will allow us to build a system which can help physicians to fill out questionnaires and convert text data to structured data. This system can reduce the workload of physicians.

[1]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[2]  Jianfeng Gao,et al.  Neural Approaches to Conversational AI , 2018, ACL.

[3]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[4]  Grigorios Tsoumakas,et al.  On the Stratification of Multi-label Data , 2011, ECML/PKDD.

[5]  Piotr Szymanski,et al.  A Network Perspective on Stratification of Multi-Label Data , 2017, LIDTA@PKDD/ECML.

[6]  Xiaodong Liu,et al.  Stochastic Answer Networks for Machine Reading Comprehension , 2017, ACL.

[7]  Hua Xu,et al.  Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network , 2015, MedInfo.

[8]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[9]  Ming Zhou,et al.  Gated Self-Matching Networks for Reading Comprehension and Question Answering , 2017, ACL.

[10]  David L. Buckeridge,et al.  A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data , 2014, J. Am. Medical Informatics Assoc..

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[13]  Massimo Piccardi,et al.  Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition , 2017, J. Biomed. Informatics.

[14]  Matthew Richardson,et al.  MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.

[15]  Heng Ji,et al.  Syntax-aware Multi-task Graph Convolutional Networks for Biomedical Relation Extraction , 2019, EMNLP.

[16]  Donia Scott,et al.  Extracting information from the text of electronic medical records to improve case detection: a systematic review , 2016, J. Am. Medical Informatics Assoc..

[17]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.

[18]  Hong Yu,et al.  Structured prediction models for RNN based sequence labeling in clinical text , 2016, EMNLP.

[19]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[20]  Christopher G Chute,et al.  Discovering peripheral arterial disease cases from radiology notes using natural language processing. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[21]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[22]  Franck Dernoncourt,et al.  Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives , 2018, PloS one.

[23]  Hong Yu,et al.  Bidirectional RNN for Medical Event Detection in Electronic Health Records , 2016, NAACL.

[24]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).