Tagging Funding Agencies and Grants in Scientific Articles using Sequential Learning Models

In this paper we present a solution for tagging funding bodies and grants in scientific articles using a combination of trained sequential learning models, namely conditional random fields (CRF), hidden markov models (HMM) and maximum entropy models (MaxEnt), on a benchmark set created in-house. We apply the trained models to address the BioASQ challenge 5c, which is a newly introduced task that aims to solve the problem of funding information extraction from scientific articles. Results in the dry-run data set of BioASQ task 5c show that the suggested approach can achieve a micro-recall of more than 85% in tagging both funding bodies and grants.

[1]  Hermann Ney,et al.  Maximum Entropy Models for Named Entity Recognition , 2003, CoNLL.

[2]  Siddhartha Jonnalagadda,et al.  NEMO: Extraction and normalization of organization names from PubMed affiliation strings , 2010, Journal of biomedical discovery and collaboration.

[3]  Hwee Tou Ng,et al.  Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[4]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[5]  B. Carpenter,et al.  LingPipe for 99.99% Recall of Gene Mentions , 2007 .

[6]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[7]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[8]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[9]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[10]  Mitchell P. Marcus,et al.  Maximum entropy models for natural language ambiguity resolution , 1998 .

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Panagiotis Stamatopoulos,et al.  RULE-BASED NAMED ENTITY RECOGNITION FOR GREEK FINANCIAL TEXTS , 2000 .

[13]  Frederick Reiss,et al.  Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks , 2010, EMNLP.

[14]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[15]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[16]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[17]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[18]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[19]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[20]  Muin J. Khoury,et al.  An automatic method to generate domain-specific investigator networks using PubMed abstracts , 2007, BMC Medical Informatics Decis. Mak..

[21]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[22]  David Yarowsky,et al.  Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[23]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[24]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[25]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[26]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[27]  Georgios Balikas,et al.  An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.