Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and Resources

Machine Reading Comprehension (MRC) aims to extract answers to questions given a passage. It has been widely studied recently, especially in open domains. However, few efforts have been made on closed-domain MRC, mainly due to the lack of large-scale training data. In this paper, we introduce a multi-target MRC task for the medical domain, whose goal is to predict answers to medical questions and the corresponding support sentences from medical information sources simultaneously, in order to ensure the high reliability of medical knowledge serving. A high-quality dataset is manually constructed for the purpose, named Multi-task Chinese Medical MRC dataset (CMedMRC), with detailed analysis conducted. We further propose the Chinese medical BERT model for the task (CMedBERT), which fuses medical knowledge into pre-trained language models by the dynamic fusion mechanism of heterogeneous features and the multi-task learning strategy. Experiments show that CMedBERT consistently outperforms strong baselines by fusing context-aware and knowledge-aware token representations.

[1]  Rajarshi Das,et al.  Building Dynamic Knowledge Graphs from Text using Machine Reading Comprehension , 2018, ICLR.

[2]  Chang Zhou,et al.  Cognitive Graph for Multi-Hop Reading Comprehension at Scale , 2019, ACL.

[3]  Qingcai Chen,et al.  Towards Medical Machine Reading Comprehension with Structural Knowledge and Plain Text , 2020, EMNLP.

[4]  Weiming Zhang,et al.  Neural Machine Reading Comprehension: Methods and Trends , 2019, Applied Sciences.

[5]  Xinwei Feng,et al.  Machine Reading Comprehension Using Structural Knowledge Graph-aware Network , 2019, EMNLP.

[6]  Walter Daelemans,et al.  CliCR: a Dataset of Clinical Case Reports for Machine Reading Comprehension , 2018, NAACL.

[7]  Bela Gipp,et al.  Enriching BERT with Knowledge Graph Embeddings for Document Classification , 2019, KONVENS.

[8]  Philip S. Yu,et al.  Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering , 2020, ArXiv.

[9]  Deyi Xiong,et al.  BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels , 2019, EMNLP.

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  Nan Duan,et al.  Enhancing Answer Boundary Detection for Multilingual Machine Reading Comprehension , 2020, ACL.

[12]  Jungang Xu,et al.  A Survey on Neural Machine Reading Comprehension , 2019, ArXiv.

[13]  Heng Ji,et al.  Improving Question Answering with External Knowledge , 2019, EMNLP.

[14]  Peter Clark,et al.  Modeling Biological Processes for Reading Comprehension , 2014, EMNLP.

[15]  Maosong Sun,et al.  ERNIE: Enhanced Language Representation with Informative Entities , 2019, ACL.

[16]  Quoc V. Le,et al.  QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[17]  Tom M. Mitchell,et al.  Leveraging Knowledge Bases in LSTMs for Improving Machine Reading , 2017, ACL.

[18]  Matthew Richardson,et al.  MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.

[19]  Huanbo Luan,et al.  Modeling Relation Paths for Representation Learning of Knowledge Bases , 2015, EMNLP.

[20]  Kenneth Lee,et al.  Interventions to Assist Health Consumers to Find Reliable Online Health Information: A Comprehensive Review , 2014, PloS one.

[21]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[22]  Zhe Zhao,et al.  K-BERT: Enabling Language Representation with Knowledge Graph , 2019, AAAI.

[23]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[24]  William W. Cohen,et al.  Quasar: Datasets for Question Answering by Search and Reading , 2017, ArXiv.

[25]  Quanshi Zhang,et al.  Towards a Deep and Unified Understanding of Deep Neural Models in NLP , 2019, ICML.

[26]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[27]  Eunsol Choi,et al.  QuAC: Question Answering in Context , 2018, EMNLP.

[28]  Wanxiang Che,et al.  Document Modeling with Graph Attention Networks for Multi-grained Machine Reading Comprehension , 2020, ACL.

[29]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[30]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[31]  Sebastian Riedel,et al.  Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.

[32]  Danqi Chen,et al.  A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.

[33]  Jian Wang,et al.  Reading Chinese Script : A Cognitive Analysis , 1999 .

[34]  Mohit Bansal,et al.  Commonsense for Generative Multi-Hop Question Answering Tasks , 2018, EMNLP.

[35]  Zhen-Hua Ling,et al.  Neural Natural Language Inference Models Enhanced with External Knowledge , 2017, ACL.

[36]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[37]  An Yang,et al.  Enhancing Pre-Trained Language Representations with Rich Knowledge for Machine Reading Comprehension , 2019, ACL.

[38]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  N. Dalmer Questioning reliability assessments of health information on social media , 2017, Journal of the Medical Library Association : JMLA.

[41]  Xinyan Xiao,et al.  DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications , 2017, QA@ACL.

[42]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[43]  Nelson F. Liu,et al.  Crowdsourcing Multiple Choice Science Questions , 2017, NUT@EMNLP.

[44]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[45]  Benoît Sagot,et al.  What Does BERT Learn about the Structure of Language? , 2019, ACL.

[46]  Qianghuai Jia,et al.  Conceptualized Representation Learning for Chinese Biomedical Text Mining , 2020, ArXiv.

[47]  Wentao Ma,et al.  A Span-Extraction Dataset for Chinese Machine Reading Comprehension , 2019, EMNLP-IJCNLP.

[48]  Xiaodong Liu,et al.  Stochastic Answer Networks for Machine Reading Comprehension , 2017, ACL.

[49]  Zhiyuan Liu,et al.  Learning Entity and Relation Embeddings for Knowledge Graph Completion , 2015, AAAI.