Syntactic Patterns Improve Information Extraction for Medical Search

Medical professionals search the published literature by specifying the type of patients, the medical intervention(s) and the outcome measure(s) of interest. In this paper we demonstrate how features encoding syntactic patterns improve the performance of state-of-the-art sequence tagging models (both linear and neural) for information extraction of these medically relevant categories. We present an analysis of the type of patterns exploited, and the semantic space induced for these, i.e., the distributed representations learned for identified multi-token patterns. We show that these learned representations differ substantially from those of the constituent unigrams, suggesting that the patterns capture contextual information that is otherwise lost.

[1]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[2]  W. Richardson,et al.  The well-built clinical question: a key to evidence-based decisions. , 1995, ACP journal club.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[5]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[7]  Ellen Riloff,et al.  An Introduction to the Sundance and AutoSlog Systems , 2011 .

[8]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[9]  Jian-Yun Nie,et al.  Clinical Information Retrieval using Document and PICO Structure , 2010, NAACL.

[10]  Jimmy J. Lin,et al.  Evaluation of PICO as a Knowledge Representation for Clinical Questions , 2006, AMIA.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[13]  Cécile Paris,et al.  Query-oriented evidence extraction to support evidence-based medicine practice , 2016, J. Biomed. Informatics.

[14]  Mark Stevenson,et al.  A corpus of potentially contradictory research claims from cardiovascular research abstracts , 2016, Journal of Biomedical Semantics.

[15]  Byron C. Wallace,et al.  Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision , 2016, J. Mach. Learn. Res..

[16]  Jane L. Forrest,et al.  Enhancing your practice through evidence-based decision making: PICO, learning how to ask good questions , 2001 .

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  C. Heneghan,et al.  Evidence Based Medicine Toolkit , 2001 .

[19]  Ellen Riloff,et al.  Little words can make a big difference for text classification , 1995, SIGIR '95.

[20]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Jian-Yun Nie,et al.  Improving Medical Information Retrieval with PICO Element Detection , 2010, ECIR.

[23]  Shigeaki Yamazaki,et al.  Adoption of structured abstracts by general medical journals and format for a structured abstract. , 2005, Journal of the Medical Library Association : JMLA.