Query selection methods for automated corpora construction with a use case in food-drug interactions

In this paper, we address the problem of automatically constructing a relevant corpus of scientific articles about food-drug interactions. There is a growing number of scientific publications that describe food-drug interactions but currently building a high-coverage corpus that can be used for information extraction purposes is not trivial. We investigate several methods for automating the query selection process using an expert-curated corpus of food-drug interactions. Our experiments show that index term features along with a decision tree classifier are the best approach for this task and that feature selection approaches and in particular gain ratio outperform frequency-based methods for query selection.

[1]  Angelo A. Izzo,et al.  Interactions Between Herbal Medicines and Prescribed Drugs , 2012, Drugs.

[2]  Ljupco Kocarev,et al.  Inferring Cuisine - Drug Interactions Using the Linked Data Approach , 2015, Scientific Reports.

[3]  Fleur Mougin,et al.  Automatic Query Selection for Acquisition and Discovery of Food-Drug Interactions , 2018, CLEF.

[4]  Juliane Fluck,et al.  Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects from medical case reports , 2012, J. Biomed. Informatics.

[5]  Paloma Martínez,et al.  SemEval-2013 Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013) , 2013, *SEMEVAL.

[6]  A. Dahan,et al.  Food–drug interaction: grapefruit juice augments drug bioavailability—mechanism, extent and relevance , 2004, European Journal of Clinical Nutrition.

[7]  Thierry Hamon,et al.  Improving Term Extraction with Terminological Resources , 2006, FinTAL.

[8]  Laurent Romary,et al.  Experiments with Citation Mining and Key-Term Extraction for Prior Art Search , 2010, CLEF.

[9]  Ramesh Nallapati,et al.  Discriminative models for information retrieval , 2004, SIGIR '04.

[10]  Stewart E. Glaspole Stockley’s Drug Interactions , 2018 .

[11]  Allan Hanbury,et al.  CLEF-IP 2011: Retrieval in the Intellectual Property Domain , 2011, CLEF.

[12]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[13]  Vasudeva Varma,et al.  Exploring Keyphrase Extraction and IPC Classification Vectors for Prior Art Search , 2011, CLEF.

[14]  Fleur Mougin,et al.  POMELO: Medline corpus with manually annotated food-drug interactions , 2017, BiomedicalNLP@RANLP.

[15]  Joseph Finkelstein,et al.  An automated system for retrieving herb-drug interaction related articles from MEDLINE , 2016, CRI.