WebGMAP: a web service for mapping and aligning cDNA sequences to genomes

The genomes of thousands of organisms are being sequenced, often with accompanying sequences of cDNAs or ESTs. One of the great challenges in bioinformatics is to make these genomic sequences and genome annotations accessible in a user-friendly manner to general biologists to address interesting biological questions. We have created an open-access web service called WebGMAP (http://www.bioinfolab.org/software/webgmap) that seamlessly integrates cDNA-genome alignment tools, such as GMAP, with easy-to-use data visualization and mining tools. This web service is intended to facilitate community efforts in improving genome annotation, determining accurate gene structures and their variations, and exploring important biological processes such as alternative splicing and alternative polyadenylation. For routine sequence analysis, WebGMAP provides a web-based sequence viewer with many useful functions, including nucleotide positioning, six-frame translations, sequence reverse complementation, and imperfect motif detection and alignment. WebGMAP also provides users with the ability to sort, filter and search for individual cDNA sequences and cDNA-genome alignments. Our EST-Genome-Browser can display annotated gene structures and cDNA-genome alignments at scales from 100 to 50 000 nt. With its ability to highlight base differences between query cDNAs and the genome, our EST-Genome-Browser allows biologists to discover potential point or insertion-deletion variations from cDNA-genome alignments.

[1]  I-Min A. Chen,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[2]  Tin Wee Tan,et al.  MGAlignIt: a web service for the alignment of mRNA/EST and genomic sequences , 2003, Nucleic Acids Res..

[3]  Nikos Kyrpides,et al.  The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata , 2007, Nucleic Acids Res..

[4]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[5]  GeneSeqer@PlantGDB: Gene structure prediction in plant genomes. , 2003, Nucleic acids research.

[6]  E. Birney,et al.  EGASP: the human ENCODE Genome Annotation Assessment Project , 2006, Genome Biology.

[7]  W. Barbazuk,et al.  Genome-wide analyses of alternative splicing in plants: opportunities and challenges. , 2008, Genome research.

[8]  B. Haas Analysis of alternative splicing in plants with bioinformatics tools. , 2008, Current topics in microbiology and immunology.

[9]  Namshin Kim,et al.  ECgene: genome-based EST clustering and gene modeling for alternative splicing. , 2005, Genome research.

[10]  W Brad Barbazuk,et al.  Gene discovery and annotation using LCM-454 transcriptome sequencing. , 2006, Genome research.

[11]  O. Gotoh,et al.  A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence , 2008, Nucleic acids research.

[12]  Gunnar Rätsch,et al.  PALMA: mRNA to genome alignments using large margin algorithms , 2007, Bioinform..

[13]  Chun Liang,et al.  Unique Features of Nuclear mRNA Poly(A) Signals and Alternative Polyadenylation in Chlamydomonas reinhardtii , 2008, Genetics.

[14]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[15]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[16]  M. Brent Steady progress and recent breakthroughs in the accuracy of automated genome annotation , 2008, Nature Reviews Genetics.

[17]  Lincoln Stein,et al.  nGASP – the nematode genome annotation assessment project , 2008, BMC Bioinformatics.

[18]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[19]  B. Peters,et al.  Distinguishing cancer-associated missense mutations from common polymorphisms. , 2007, Cancer research.

[20]  Patrick Wincker,et al.  Large-scale gene discovery in the pea aphid Acyrthosiphon pisum (Hemiptera) , 2006, Genome Biology.

[21]  Stephen L. Johnson,et al.  Genetic variation in the zebrafish. , 2006, Genome research.

[22]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[23]  Wei Zhu,et al.  Optimal spliced alignment of homologous cDNA to a genomic DNA template , 2000, Bioinform..

[24]  B. Haas,et al.  Sequencing Medicago truncatula expressed sequenced tags using 454 Life Sciences technology , 2006, BMC Genomics.

[25]  D. Church,et al.  Spidey: a tool for mRNA-to-genomic alignments. , 2001, Genome research.

[26]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[27]  Jennifer Daub,et al.  Expressed sequence tags: medium-throughput protocols. , 2004, Methods in molecular biology.