Extracting Factual Min/Max Age Information from Clinical Trial Studies

Population age information is an essential characteristic of clinical trials. In this paper, we focus on extracting minimum and maximum (min/max) age values for the study samples from clinical research articles. Specifically, we investigate the use of a neural network model for question answering to address this information extraction task. The min/max age QA model is trained on the massive structured clinical study records from this http URL. For each article, based on multiple min and max age values extracted from the QA model, we predict both actual min/max age values for the study samples and filter out non-factual age expressions. Our system improves the results over (i) a passage retrieval based IE system and (ii) a CRF-based system by a large margin when evaluated on an annotated dataset consisting of 50 research papers on smoking cessation.

[1]  Mari Ostendorf,et al.  Scientific Information Extraction with Semi-supervised Neural Tagging , 2017, EMNLP.

[2]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[3]  Ted Briscoe,et al.  Weakly Supervised Learning for Hedge Classification in Scientific Literature , 2007, ACL.

[4]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[5]  Jinglei Zhao,et al.  A proximity language model for information retrieval , 2009, SIGIR.

[6]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[7]  János Csirik,et al.  The CoNLL-2010 Shared Task: Learning to Detect Hedges and their Scope in Natural Language Text , 2010, CoNLL Shared Task.

[8]  Walter Daelemans,et al.  CliCR: a Dataset of Clinical Case Reports for Machine Reading Comprehension , 2018, NAACL.

[9]  Pol Mac Aonghusa,et al.  Unsupervised Information Extraction from Behaviour Change Literature , 2018, MIE.

[10]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[11]  Rodney L. Summerscales,et al.  AUTOMATIC SUMMARIZATION OF CLINICAL ABSTRACTS FOR EVIDENCE-BASED MEDICINE , 2013 .

[12]  Patrice Lopez,et al.  GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications , 2009, ECDL.

[13]  Joel D. Martin,et al.  Automated Information Extraction of Key Trial Design Elements from Clinical Trial Publications , 2008, AMIA.

[14]  Halil Kilicoglu,et al.  Recognizing speculative language in biomedical research articles: a linguistically motivated perspective , 2008, BMC Bioinformatics.

[15]  Grace Chung,et al.  A method of extracting the number of trial participants from abstracts describing randomized controlled trials , 2008, Journal of telemedicine and telecare.

[16]  Jane Hunter,et al.  Identifying scientific artefacts in biomedical literature: The Evidence Based Medicine use case , 2014, J. Biomed. Informatics.

[17]  Padmini Srinivasan,et al.  The Language of Bioscience: Facts, Speculations, and Statements In Between , 2004, HLT-NAACL 2004.

[18]  David Martínez,et al.  Automatic classification of sentences to support Evidence Based Medicine , 2011, BMC Bioinformatics.

[19]  Roser Morante,et al.  Modality and Negation: An Introduction to the Special Issue , 2012, CL.

[20]  Byron C. Wallace,et al.  Extracting PICO Sentences from Clinical Trial Reports using Supervised Distant Supervision , 2016, J. Mach. Learn. Res..

[21]  Joel D. Martin,et al.  ExaCT: automatic extraction of clinical trial characteristics from journal publications , 2010, BMC Medical Informatics Decis. Mak..

[22]  Pol Mac Aonghusa,et al.  The Human Behaviour-Change Project: harnessing the power of artificial intelligence and machine learning for evidence synthesis and interpretation , 2017, Implementation Science.

[23]  Min-Yen Kan,et al.  Extracting Formulaic and Free Text Clinical Research Articles Metadata using Conditional Random Fields , 2010, Louhi@NAACL-HLT.