Learning to Annotate Scientific Publications

Annotating scientific publications with keywords and phrases is of great importance to searching, indexing, and cataloging such documents. Unlike previous studies that focused on user-centric annotation, this paper presents our investigation of various annotation characteristics on service-centric annotation. Using a large number of publicly available annotated scientific publications, we characterized and compared the two different types of annotation processes. Furthermore, we developed an automatic approach of annotating scientific publications based on a machine learning algorithm and a set of novel features. When compared to other methods, our approach shows significantly improved performance. Experimental data sets and evaluation results are publicly available at the supplementary website.

[1]  Dinan Gunawardena,et al.  Social tags: meaning and suggestions , 2008, CIKM '08.

[2]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[3]  Jimmy J. Lin,et al.  PubMed related articles: a probabilistic topic-based model for content similarity , 2007, BMC Bioinformatics.

[4]  Óscar Corcho,et al.  Ontology based document annotation: trends and open research problems , 2006, Int. J. Metadata Semant. Ontologies.

[5]  William Fuller Brown,et al.  Methods of Statistical Analysis , 1939 .

[6]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[7]  Rui Li,et al.  Towards effective browsing of large scale social annotations , 2007, WWW '07.

[8]  Patrick Ruch,et al.  Automatic assignment of biomedical categories: toward a generic approach , 2006, Bioinform..

[9]  Luo Si,et al.  CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists , 2005, CLEF.

[10]  Min-Yen Kan,et al.  Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[11]  Trevor Cohen,et al.  Reflective random indexing for semi-automatic indexing of the biomedical literature , 2010, J. Biomed. Informatics.

[12]  Andreas Hotho,et al.  Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[13]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[14]  Dmitriy Fradkin,et al.  Anticipating annotations and emerging trends in biomedical literature , 2008, KDD.

[15]  Xin Jiang,et al.  A ranking approach to keyphrase extraction , 2009, SIGIR.

[16]  Henrik Eriksson An Annotation Tool for Semantic Documents , 2007, ESWC.

[17]  Yong Yu,et al.  Optimizing web search using social annotations , 2007, WWW '07.

[18]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[19]  T. Minka A comparison of numerical optimizers for logistic regression , 2004 .

[20]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[21]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[22]  Susanne M. Humphrey,et al.  The NLM Indexing Initiative's Medical Text Indexer , 2004, MedInfo.

[23]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[24]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[25]  Wessel Kraaij,et al.  MeSH Up: effective MeSH text classification for improved document retrieval , 2009, Bioinform..

[26]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[27]  Gilad Mishne,et al.  AutoTag: a collaborative approach to automated tag assignment for weblog posts , 2006, WWW '06.

[28]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[29]  Luo Si,et al.  A Probabilistic Framework for Answer Selection in Question Answering , 2007, NAACL.

[30]  Jianchang Mao,et al.  Towards the Semantic Web: Collaborative Tag Suggestions , 2006 .