Mining Semantic Descriptions of Bioinformatics Web Resources from the Literature

A number of projects (myGrid, BioMOBY, etc.) have recently been initiated in order to organise emerging bioinformatics Web Services and provide their semantic descriptions. They typically rely on manual curation efforts. In this paper we focus on a semi-automated approach to mine semantic descriptions from the bioinformatics literature. The method combines terminological processing and dependency parsing of journal articles, and applies information extraction techniques to profile Web services using informative textual passages, related ontological annotations and service descriptors. Service descriptors are terminological phrases reflecting related concepts (e.g. tasks, approaches, data) and/or specific roles (e.g. input/output parameters, etc.) of the associated resource classes (e.g. algorithms, databases, etc.). They can be used to facilitate subsequent manual description of services, but also for providing a semantic synopsis of a service that can be used to locate related services. We present a case-study involving full text articles from the BMC Bioinformatics journal. We illustrate the potential of natural language processing not only for mining descriptions of known services, but also for discovering new services that have been described in the literature.

[1]  Carole A. Goble,et al.  Automatic annotation of Web services based on workflow definitions , 2006, TWEB.

[2]  Dean Allemang,et al.  The Semantic Web - ISWC 2006, 5th International Semantic Web Conference, ISWC 2006, Athens, GA, USA, November 5-9, 2006, Proceedings , 2006, SEMWEB.

[3]  James Pustejovsky,et al.  Corpus processing for lexical acquisition , 1996 .

[4]  Yuka Tateisi,et al.  Annotation of Predicate-argument Structure on Molecular Biology Text , 2004 .

[5]  Goran Nenadic,et al.  Towards Semantic Annotation of Bioinformatics Services: Building a Controlled Vocabulary , 2008, SMBM 2008.

[6]  Nigel Collier,et al.  PASBio: predicate-argument structures for event extraction in molecular biology , 2004, BMC Bioinformatics.

[7]  Brigitte Mathiak,et al.  A database ontology for signal transduction pathways , 2007, Int. J. Bioinform. Res. Appl..

[8]  Dietrich Rebholz-Schuhmann,et al.  Text processing through Web services: calling Whatizit , 2008, Bioinform..

[9]  Hinrich Schütze,et al.  Customizing a Lexicon to Better Suit a Computational Task , 1996 .

[10]  Craig A. Knoblock,et al.  Learning Semantic Descriptions of Web Information Sources , 2007, IJCAI.

[11]  Dennis Gannon,et al.  Workflows for e-Science, Scientific Workflows for Grids , 2014 .

[12]  Kristina Lerman,et al.  Automatically Labeling the Inputs and Outputs of Web Services , 2006, AAAI.

[13]  Carole A. Goble,et al.  Feta: A Light-Weight Architecture for User Oriented Semantic Service Discovery , 2005, ESWC.

[14]  Anand Kumar,et al.  Text mining and ontologies in biomedicine: Making sense of raw text , 2005, Briefings Bioinform..

[15]  Nicholas Kushmerick,et al.  Learning to Attach Semantic Metadata to Web Services , 2003, International Semantic Web Conference.

[16]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[17]  Carole A. Goble,et al.  Taverna/myGrid: Aligning a Workflow System with the Life Sciences Community , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[18]  Russ B. Altman,et al.  Time to Organize the Bioinformatics Resourceome , 2005, PLoS Comput. Biol..

[19]  Carole A. Goble,et al.  The myGrid ontology: bioinformatics service discovery , 2007, Int. J. Bioinform. Res. Appl..