A system for extracting study design parameters from nutritional genomics abstracts.

The extraction of study design parameters from biomedical journal articles is an important problem in natural language processing (NLP). Such parameters define the characteristics of a study, such as the duration, the number of subjects, and their profile. Here we present a system for extracting study design parameters from sentences in article abstracts. This system will be used as a component of a larger system for creating nutrigenomics networks from articles in the nutritional genomics domain. The algorithms presented consist of manually designed rules expressed either as regular expressions or in terms of sentence parse structure. A number of filters and NLP tools are also utilized within a pipelined algorithmic framework. Using this novel approach, our system performs extraction at a finer level of granularity than comparable systems, while generating results that surpass the current state of the art.

[1]  Mónica Marrero,et al.  Evaluation of Named Entity Extraction Systems , 2009 .

[2]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[3]  Hamish Cunningham,et al.  Information Extraction, Automatic , 2006 .

[4]  Shlomo Argamon,et al.  Automatic Summarization of Results from Clinical Trials , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[5]  Joel D. Martin,et al.  Automated Information Extraction of Key Trial Design Elements from Clinical Trial Publications , 2008, AMIA.

[6]  Hui Yang,et al.  Mining Biomedical Text towards Building a Quantitative Food-Disease-Gene Network , 2011, Learning Structure and Schemas from Documents.

[7]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[8]  Jin Zhao,et al.  Exploiting Classification Correlations for the Extraction of Evidence-based Practice Information , 2012, AMIA.

[9]  Hui Yang,et al.  On Building a Quantitative Food-Disease-Gene Network , 2010, BICoB.

[10]  Min-Yen Kan,et al.  Improving Search for Evidence-based Practice using Information Extraction. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[11]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.

[12]  Jianfeng Gao,et al.  Long Distance Dependency in Language Modeling: An Empirical Study , 2004, IJCNLP.

[13]  Yuji Matsumoto,et al.  Extracting Clinical Trial Design Information from MEDLINE Abstracts , 2007, New Generation Computing.

[14]  Russ B. Altman,et al.  Extracting Subject Demographic Information From Abstracts of Randomized Clinical Trial Reports , 2007, MedInfo.

[15]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[16]  Joel D. Martin,et al.  ExaCT: automatic extraction of clinical trial characteristics from journal publications , 2010, BMC Medical Informatics Decis. Mak..