Lessons from Natural Language Inference in the Clinical Domain

State of the art models using deep neural networks have become very good in learning an accurate mapping from inputs to outputs. However, they still lack generalization capabilities in conditions that differ from the ones encountered during training. This is even more challenging in specialized, and knowledge intensive domains, where training data is limited. To address this gap, we introduce MedNLI - a dataset annotated by doctors, performing a natural language inference task (NLI), grounded in the medical history of patients. We present strategies to: 1) leverage transfer learning using datasets from the open domain, (e.g. SNLI) and 2) incorporate domain knowledge from external data and lexical sources (e.g. medical terminologies). Our results demonstrate performance gains using both strategies.

[1]  Eric Fosler-Lussier,et al.  Textual inference for eligibility criteria resolution in clinical trials , 2015, J. Biomed. Informatics.

[2]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[3]  Franck Dernoncourt,et al.  Transfer Learning for Named-Entity Recognition with Neural Networks , 2017, LREC.

[4]  Geoffrey E. Hinton,et al.  Grammar as a Foreign Language , 2014, NIPS.

[5]  Paul K J Han,et al.  Varieties of uncertainty in health care: a conceptual taxonomy. , 2011, Medical decision making : an international journal of the Society for Medical Decision Making.

[6]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[7]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[8]  Ido Dagan,et al.  Entailment-based Text Exploration with Application to the Health-care Domain , 2012, ACL.

[9]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[10]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[11]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[12]  Dan Roth,et al.  “Ask Not What Textual Entailment Can Do for You...” , 2010, ACL.

[13]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[15]  Samuel R. Bowman,et al.  The RepEval 2017 Shared Task: Multi-Genre Natural Language Inference with Sentence Representations , 2017, RepEval@EMNLP.

[16]  Olivier Bodenreider,et al.  Comparing terms, concepts and semantic classes in WordNet and the Unified Medical Language System , 2001 .

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[19]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[20]  Siddharth Patwardhan,et al.  Addressing Limited Data for Textual Entailment Across Domains , 2016, ACL.

[21]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[22]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.

[23]  John F. Hurdle,et al.  Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research , 2008, Yearbook of Medical Informatics.

[24]  Sunil Kumar Sahu,et al.  What matters in a transferable neural network model for relation classification in the biomedical domain? , 2017, Artif. Intell. Medicine.

[25]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[26]  Chunhua Weng,et al.  Formal representation of eligibility criteria: A literature review , 2010, J. Biomed. Informatics.

[27]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[28]  Masatoshi Tsuchiya,et al.  Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment , 2018, LREC.

[29]  Deniz Yuret,et al.  Transfer Learning for Low-Resource Neural Machine Translation , 2016, EMNLP.

[30]  Rachel Rudinger,et al.  Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[31]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[32]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[33]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[34]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[35]  Asma Ben Abacha,et al.  Recognizing Question Entailment for Medical Question Answering , 2016, AMIA.

[36]  Richard Socher,et al.  Learned in Translation: Contextualized Word Vectors , 2017, NIPS.

[37]  Carol Friedman,et al.  Two biomedical sublanguages: a description based on the theories of Zellig Harris , 2002, J. Biomed. Informatics.

[38]  Asma Ben Abacha,et al.  Semantic Analysis and Automatic Corpus Construction for Entailment Recognition in Medical Texts , 2015, AIME.

[39]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[40]  Andrew Y. Ng,et al.  CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning , 2017, ArXiv.

[41]  J Quinonero Candela,et al.  Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment , 2006, Lecture Notes in Computer Science.

[42]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[43]  Ido Dagan,et al.  Recognizing textual entailment: Rational, evaluation and approaches , 2009, Natural Language Engineering.

[44]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[45]  Yonatan Belinkov,et al.  Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.

[46]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[47]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[48]  Gabriel Stanovsky,et al.  Recognizing Mentions of Adverse Drug Reaction in Social Media Using Knowledge-Infused Recurrent Models , 2017, EACL.