论文信息 - Tagging Funding Agencies and Grants in Scientific Articles using Sequential Learning Models - 字舞流文

Tagging Funding Agencies and Grants in Scientific Articles using Sequential Learning Models

In this paper we present a solution for tagging funding bodies and grants in scientific articles using a combination of trained sequential learning models, namely conditional random fields (CRF), hidden markov models (HMM) and maximum entropy models (MaxEnt), on a benchmark set created in-house. We apply the trained models to address the BioASQ challenge 5c, which is a newly introduced task that aims to solve the problem of funding information extraction from scientific articles. Results in the dry-run data set of BioASQ task 5c show that the suggested approach can achieve a micro-recall of more than 85% in tagging both funding bodies and grants.

George Tsatsaronis | Subhradeep Kayal | Sophia Katrenko | Zubair Afzal | Michelle Gregory | Pascal Coupet | Marius A. Doornenbal | S. Katrenko | M. Gregory | G. Tsatsaronis | Z. Afzal | Pascal Coupet | Subhradeep Kayal

[1] Hermann Ney,et al. Maximum Entropy Models for Named Entity Recognition , 2003, CoNLL.

[2] Siddhartha Jonnalagadda,et al. NEMO: Extraction and normalization of organization names from PubMed affiliation strings , 2010, Journal of biomedical discovery and collaboration.

[3] Hwee Tou Ng,et al. Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[4] Wei Li,et al. Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[5] B. Carpenter,et al. LingPipe for 99.99% Recall of Gene Mentions , 2007 .

[6] Kentaro Torisawa,et al. Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[7] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[8] Oren Etzioni,et al. Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[9] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .

[10] Mitchell P. Marcus,et al. Maximum entropy models for natural language ambiguity resolution , 1998 .

[11] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12] Panagiotis Stamatopoulos,et al. RULE-BASED NAMED ENTITY RECOGNITION FOR GREEK FINANCIAL TEXTS , 2000 .

[13] Frederick Reiss,et al. Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks , 2010, EMNLP.

[14] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[15] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[16] Tong Zhang,et al. Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[17] Satoshi Sekine,et al. A survey of named entity recognition and classification , 2007 .

[18] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[19] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[20] Muin J. Khoury,et al. An automatic method to generate domain-specific investigator networks using PubMed abstracts , 2007, BMC Medical Informatics Decis. Mak..

[21] Christopher D. Manning,et al. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[22] David Yarowsky,et al. Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence , 1999, EMNLP.

[23] James Richard Curran,et al. From distributional to semantic similarity , 2004 .

[24] Thomas G. Dietterich. Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[25] Jian Su,et al. Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[26] Richard M. Schwartz,et al. An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[27] Georgios Balikas,et al. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition , 2015, BMC Bioinformatics.