WebScipio: An online tool for the determination of gene structures using protein sequences

BackgroundObtaining the gene structure for a given protein encoding gene is an important step in many analyses. A software suited for this task should be readily accessible, accurate, easy to handle and should provide the user with a coherent representation of the most probable gene structure. It should be rigorous enough to optimise features on the level of single bases and at the same time flexible enough to allow for cross-species searches.ResultsWebScipio, a web interface to the Scipio software, allows a user to obtain the corresponding coding sequence structure of a here given a query protein sequence that belongs to an already assembled eukaryotic genome. The resulting gene structure is presented in various human readable formats like a schematic representation, and a detailed alignment of the query and the target sequence highlighting any discrepancies. WebScipio can also be used to identify and characterise the gene structures of homologs in related organisms. In addition, it offers a web service for integration with other programs.ConclusionWebScipio is a tool that allows users to get a high-quality gene structure prediction from a protein query. It offers more than 250 eukaryotic genomes that can be searched and produces predictions that are close to what can be achieved by manual annotation, for in-species and cross-species searches alike. WebScipio is freely accessible at http://www.webscipio.org.

[1]  Andreas Prlic,et al.  Ensembl 2008 , 2007, Nucleic Acids Res..

[2]  Ting Wang,et al.  The UCSC Genome Browser Database: update 2009 , 2008, Nucleic Acids Res..

[3]  E. Birney,et al.  Comparative genomics: genome-wide analysis in metazoan eukaryotes , 2003, Nature Reviews Genetics.

[4]  C. Backendorf,et al.  Identification of regulatory elements by gene family footprinting and in vivo analysis. , 2007, Advances in biochemical engineering/biotechnology.

[5]  Giuliana Franceschinis,et al.  RRE: a tool for the extraction of non-coding regions surrounding annotated genes from genomic datasets , 2004, Bioinform..

[6]  Florian Odronitz,et al.  Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase) , 2006, BMC Genomics.

[7]  Prachi Shah,et al.  The ENCODEdb portal: simplified access to ENCODE Consortium data. , 2007, Genome research.

[8]  Michael Bächle,et al.  Ruby on Rails , 2006, Softwaretechnik-Trends.

[9]  Emmanouil T Dermitzakis,et al.  Functional variation and evolution of non-coding DNA. , 2006, Current opinion in genetics & development.

[10]  Florian Odronitz,et al.  diArk – a resource for eukaryotic genome research , 2007, BMC Genomics.

[11]  R. Fluhr,et al.  Comparative Cross-Species Alternative Splicing in Plants1[W][OA] , 2007, Plant Physiology.

[12]  Li Cai,et al.  Non-coding sequence retrieval system for comparative genomic analysis of gene regulatory elements , 2007, BMC Bioinformatics.

[13]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[14]  Eugene W. Myers,et al.  Basic local alignment search tool. Journal of Molecular Biology , 1990 .

[15]  Antoine Quint,et al.  Scalable Vector Graphics , 2020, Definitions.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  Florian Odronitz,et al.  Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species , 2008, BMC Bioinformatics.

[18]  Inna Dubchak,et al.  Multi-species sequence comparison: the next frontier in genome annotation , 2003, Genome Biology.

[19]  Florian Odronitz,et al.  Drawing the tree of eukaryotic life based on the analysis of 2,269 manually annotated myosins from 328 species , 2007, Genome Biology.

[20]  Terrence S. Furey,et al.  The UCSC Genome Browser Database , 2003, Nucleic Acids Res..

[21]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[22]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[23]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[24]  M. Brent,et al.  Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  David Flanagan,et al.  The Ruby Programming Language , 2007 .