Connections across Scientific Publications based on Semantic Annotations

Abstract. The core information from scientific publications is encoded in natural language text and monolithic documents; therefore it is not well integrated with other structured and unstructured data resources. The text format requires additional processing to semantically interlink the publications and to finally reach interoperability of contained data. Data infrastructures such as the Linked Open Data initiative based on the Resource Description Framework support the connectivity of data from scientific publications once the identification of concepts and relations has been achieved, and the content has been interconnected semantically. In this manuscript we produce and analyze the semantic annotations in scientific articles to investigate on the interconnectivity across the articles. In our initial experiment based on articles from PubMed Central we demonstrate the means and the results leading to the interconnectivity using annotations of Medical Subject Headings concepts, Unified Medical Language System terms, and semantic abstractions of relations. We conclude that the different methods would contribute to different types of relatedness between articles that could be later used in recommendation systems based on semantic links across a network of scientific publications.

[1]  Dietrich Rebholz-Schuhmann,et al.  Distributed Modules for Text Annotation and IE Applied to the Biomedical Domain , 2004, NLPBA/BioNLP.

[2]  Yasunori Yamamoto,et al.  Biomedical knowledge navigation by literature clustering , 2007, J. Biomed. Informatics.

[3]  James Lewis,et al.  Data and text mining Text similarity : an alternative way to search MEDLINE , 2006 .

[4]  Norman P. Hummon,et al.  Connectivity in a citation network: The development of DNA theory☆ , 1989 .

[5]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[6]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[7]  Alma Swan,et al.  Overview of scholarly communication , 2006 .

[8]  F B ROGERS,et al.  Medical Subject Headings , 1948, Nature.

[9]  Samir Khuller,et al.  Link Prediction for Annotation Graphs Using Graph Summarization , 2011, SEMWEB.

[10]  Benjamin M. Good,et al.  Mining Gene Ontology Annotations From Hyperlinks in the Gene Wiki , 2011 .

[11]  Leyla Jael García Castro,et al.  Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data , 2013, Journal of Biomedical Semantics.

[12]  Rafael Berlanga Llavori,et al.  Exploiting semantic annotations for open information extraction: an experience in the biomedical domain , 2014, Knowledge and Information Systems.

[13]  Dietrich Rebholz-Schuhmann,et al.  Automatic recognition of conceptualization zones in scientific articles and two life science applications , 2012, Bioinform..

[14]  Samir Khuller,et al.  Dense Subgraphs with Restrictions and Applications to Gene Annotation Graphs , 2010, RECOMB.

[15]  Leyla Jael García Castro,et al.  An open annotation ontology for science on web 3.0 , 2011, J. Biomed. Semant..

[16]  Dietrich Rebholz-Schuhmann,et al.  Calbc Silver Standard Corpus , 2010, J. Bioinform. Comput. Biol..

[17]  Peter Woollard,et al.  Towards virtual knowledge broker services for semantic integration of life science literature and data sources. , 2013, Drug discovery today.

[18]  Kevin W. Boyack,et al.  Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches , 2011, PloS one.

[19]  P. Srinivasan,et al.  Mining MEDLINE: Postulating a Beneficial Role for Curcumin Longa in Retinal Diseases , 2004, HLT-NAACL 2004.

[20]  Nicolette de Keizer,et al.  Forty years of SNOMED: a literature review , 2008, BMC Medical Informatics Decis. Mak..

[21]  Rafael Berlanga Llavori,et al.  Semantic annotation of biomedical texts through concept retrieval , 2010, Proces. del Leng. Natural.

[22]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[23]  N. Shah,et al.  NCBO Annotator: Semantic Annotation of Biomedical Data , 2009 .

[24]  N. Jacobs,et al.  Open access : key strategic, technical and economic aspects , 2006 .

[25]  Dietrich Rebholz-Schuhmann,et al.  The CALBC RDF Triple Store: retrieval over large literature content , 2010, SWAT4LS.