Transparent mediation-based access to multiple yeast data sources using an ontology driven interface

BackgroundSaccharomyces cerevisiae is recognized as a model system representing a simple eukaryote whose genome can be easily manipulated. Information solicited by scientists on its biological entities (Proteins, Genes, RNAs...) is scattered within several data sources like SGD, Yeastract, CYGD-MIPS, BioGrid, PhosphoGrid, etc. Because of the heterogeneity of these sources, querying them separately and then manually combining the returned results is a complex and time-consuming task for biologists most of whom are not bioinformatics expert. It also reduces and limits the use that can be made on the available data.ResultsTo provide transparent and simultaneous access to yeast sources, we have developed YeastMed: an XML and mediator-based system. In this paper, we present our approach in developing this system which takes advantage of SB-KOM to perform the query transformation needed and a set of Data Services to reach the integrated data sources. The system is composed of a set of modules that depend heavily on XML and Semantic Web technologies. User queries are expressed in terms of a domain ontology through a simple form-based web interface.ConclusionsYeastMed is the first mediation-based system specific for integrating yeast data sources. It was conceived mainly to help biologists to find simultaneously relevant data from multiple data sources. It has a biologist-friendly interface easy to use. The system is available at http://www.khaos.uma.es/yeastmed/.

[1]  Laura M. Haas,et al.  Transforming Heterogeneous Data with Database Middleware: Beyond Integration , 1999, IEEE Data Eng. Bull..

[2]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[3]  Carole A. Goble,et al.  TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources , 1998, ISMB.

[4]  José Francisco Aldana Montes,et al.  AMMO-Prot: amine system project 3D-model finder , 2008, BMC Bioinformatics.

[5]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[6]  Pooja Jain,et al.  The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae , 2005, Nucleic Acids Res..

[7]  D. Clark,et al.  Spt10 and Swi4 Control the Timing of Histone H2A/H2B Gene Activation in Budding Yeast , 2010, Molecular and Cellular Biology.

[8]  Ian Horrocks,et al.  The GRAIL concept modelling language for medical terminology , 1997, Artif. Intell. Medicine.

[9]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[10]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[11]  Alon Y. Levy Combining artificial intelligence and databases for data integration , 1999 .

[12]  Subbarao Kambhampati,et al.  Integration of biological sources: current systems and challenges ahead , 2004, SGMD.

[13]  Val Tannen,et al.  K2/Kleisli and GUS: Experiments in integrated access to genomic data sources , 2001, IBM Syst. J..

[14]  Susan B. Davidson,et al.  A User-Centric Framework for Accessing Biological Sources and Tools , 2005, DILS.

[15]  David Tollervey,et al.  Distinguishing the Roles of Topoisomerases I and II in Relief of Transcription-Induced Torsional Stress in Yeast rRNA Genes , 2010, Molecular and Cellular Biology.

[16]  Alexandra Poulovassilis,et al.  Bioinformatics Service Reconciliation by Heterogeneous Schema Transformation , 2007, DILS.

[17]  Mike Tyers,et al.  PhosphoGRID: a database of experimentally verified in vivo protein phosphorylation sites from the budding yeast Saccharomyces cerevisiae , 2010, Database J. Biol. Databases Curation.

[18]  Shoshana J. Wodak,et al.  CYGD: the Comprehensive Yeast Genome Database , 2004, Nucleic Acids Res..

[19]  Rami Rifaieh,et al.  SWAMI: Integrating Biological Databases and Analysis Tools Within User Friendly Environment , 2007, DILS.

[20]  Kent L. Norman,et al.  Development of an instrument measuring user satisfaction of the human-computer interface , 1988, CHI '88.

[21]  Kei-Hoi Cheung,et al.  YeastHub: a semantic web use case for integrating data in the life sciences domain , 2005, ISMB.

[22]  Michael Wooldridge,et al.  Artificial Intelligence Today , 1999, Lecture Notes in Computer Science.

[23]  Ivan Merelli,et al.  The cell cycle DB: a systems biology approach to cell cycle analysis , 2007, Nucleic Acids Res..

[24]  James R. Lewis,et al.  IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use , 1995, Int. J. Hum. Comput. Interact..

[25]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[26]  José Francisco Aldana Montes,et al.  Systems biology metabolic modeling assistant: an ontology-based tool for the integration of metabolic data in kinetic modeling , 2009, Bioinform..

[27]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[28]  Stefan Deßloch,et al.  Towards generating ETL processes for incremental loading , 2008, IDEAS '08.

[29]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[30]  Peter Buneman,et al.  Challenges in Integrating Biological Data Sources , 1995, J. Comput. Biol..

[31]  Priyanka Gupta,et al.  BioWarehouse: a bioinformatics database warehouse toolkit , 2006, BMC Bioinformatics.

[32]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[33]  Thomas S. Tullis,et al.  A Comparison of Questionnaires for Assessing Website Usability , 2004 .

[34]  Peter F. Patel-Schneider,et al.  OWL 2 Web Ontology Language Primer (Second Edition) , 2012 .

[35]  Kara Dolinski,et al.  Saccharomyces Genome Database provides mutant phenotype data , 2009, Nucleic Acids Res..

[36]  Chris Mungall,et al.  A Chado case study: an ontology-based modular schema for representing genome-associated biological information , 2007, ISMB/ECCB.

[37]  José Francisco Aldana Montes,et al.  Extending SD-Core for Ontology-based Data Integration , 2009, J. Univers. Comput. Sci..

[38]  Limsoon Wong,et al.  A Data Transformation System for Biological Data Sources , 1995, VLDB.

[39]  José Francisco Aldana Montes,et al.  SD-Core: Generic Semantic Middleware Components for the Semantic Web , 2008, KES.

[40]  Ismael Navas-Delgado,et al.  SD-Core: Generic Semantic Middleware Components for the Semantic Web , 2008 .

[41]  Peter Tarczy-Hornoch,et al.  Biomediator Data Integration and Inference for Functional Annotation of Anonymous Sequences , 2006, Pacific Symposium on Biocomputing.

[42]  Carole A. Goble,et al.  Transparent access to multiple bioinformatics information sources , 2001, IBM Syst. J..

[43]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[44]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[45]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[46]  José Francisco Aldana Montes,et al.  A Data Warehouse Approach to Semantic Integration of Pseudomonas Data , 2010, DILS.

[47]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[48]  Tao Xu,et al.  Atlas – a data warehouse for integrative bioinformatics , 2005, BMC Bioinformatics.

[49]  James A. Hendler,et al.  The Semantic Web — ISWC 2002 , 2002, Lecture Notes in Computer Science.

[50]  B. Thomas,et al.  Usability Evaluation In Industry , 1996 .

[51]  Patrick Lambrix,et al.  Towards Transparent Access to Multiple Biological Databanks , 2003, APBC.