XenDB: Full length cDNA prediction and cross species mapping in Xenopus laevis

BackgroundResearch using the model system Xenopus laevis has provided critical insights into the mechanisms of early vertebrate development and cell biology. Large scale sequencing efforts have provided an increasingly important resource for researchers. To provide full advantage of the available sequence, we have analyzed 350,468 Xenopus laevis Expressed Sequence Tags (ESTs) both to identify full length protein encoding sequences and to develop a unique database system to support comparative approaches between X. laevis and other model systems.DescriptionUsing a suffix array based clustering approach, we have identified 25,971 clusters and 40,877 singleton sequences. Generation of a consensus sequence for each cluster resulted in 31,353 tentative contig and 4,801 singleton sequences. Using both BLASTX and FASTY comparison to five model organisms and the NR protein database, more than 15,000 sequences are predicted to encode full length proteins and these have been matched to publicly available IMAGE clones when available. Each sequence has been compared to the KOG database and ~67% of the sequences have been assigned a putative functional category. Based on sequence homology to mouse and human, putative GO annotations have been determined.ConclusionThe results of the analysis have been stored in a publicly available database XenDB http://bibiserv.techfak.uni-bielefeld.de/xendb/. A unique capability of the database is the ability to batch upload cross species queries to identify potential Xenopus homologues and their associated full length clones. Examples are provided including mapping of microarray results and application of 'in silico' analysis. The ability to quickly translate the results of various species into 'Xenopus-centric' information should greatly enhance comparative embryological approaches.Supplementary material can be found at http://bibiserv.techfak.uni-bielefeld.de/xendb/.

[1]  P. Avner,et al.  Employment opportunities for non‐coding RNAs , 2004, FEBS letters.

[2]  R. Lang,et al.  Pax6 induces ectopic eyes in a vertebrate. , 1999, Development.

[3]  Ken W. Y. Cho,et al.  Microarray optimizations: increasing spot accuracy and automated identification of true microarray signals. , 2002, Nucleic acids research.

[4]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[5]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[6]  Martin Vingron,et al.  Genome wide identification and classification of alternative splicing based on EST data , 2004, Bioinform..

[7]  S. Kurtz The Vmatch large scale sequence analysis software , 2003 .

[8]  P. Klint,et al.  Signal transduction by fibroblast growth factor receptors. , 1999, Frontiers in bioscience : a journal and virtual library.

[9]  P. Callaerts,et al.  Induction of ectopic eyes by targeted expression of the eyeless gene in Drosophila. , 1995, Science.

[10]  S. Altschul,et al.  A public database for gene expression in human cancers. , 1999, Cancer research.

[11]  Robert L. Strausberg,et al.  Cancer Genome Anatomy Project , 2006 .

[12]  W. Gehring,et al.  The genetic control of eye development and its implications for the evolution of the various eye-types. , 2001, Zoology.

[13]  P. Green,et al.  Analysis of expressed sequence tags indicates 35,000 human genes , 2000, Nature Genetics.

[14]  M. Wegner,et al.  Sox10, a Novel Transcriptional Modulator in Glial Cells , 1998, The Journal of Neuroscience.

[15]  Stefan Stamm,et al.  Signals and their transduction pathways regulating alternative splicing: a new dimension of the human genome. , 2002, Human molecular genetics.

[16]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[17]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[18]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[19]  S. Salzberg,et al.  An optimized protocol for analysis of EST sequences. , 2000, Nucleic acids research.

[20]  A. Komar,et al.  Internal Ribosome Entry Sites in Cellular mRNAs: Mystery of Their Existence* , 2005, Journal of Biological Chemistry.

[21]  W R Pearson,et al.  Comparison of DNA sequences with protein sequences. , 1997, Genomics.

[22]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[23]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[24]  Genlight: Interactive high-throughput sequence analysis and comparative genomics , 2004, J. Integr. Bioinform..

[25]  Tim Hubbard Finishing the euchromatic sequence of the human genome , 2004 .

[26]  K Ikeo,et al.  Pax 6: mastering eye morphogenesis and eye evolution. , 1999, Trends in genetics : TIG.

[27]  W. Gehring,et al.  Homeodomain proteins. , 1994, Annual review of biochemistry.

[28]  J. Venables Alternative splicing in the testes. , 2002, Current opinion in genetics & development.

[29]  A. Hemmati-Brivanlou,et al.  Caudalization of neural fate by tissue recombination and bFGF. , 1995, Development.

[30]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[31]  D. Colman,et al.  Glial‐defined rhombomere boundaries in developing Xenopus hindbrain , 2000, The Journal of comparative neurology.

[32]  C. Ponting,et al.  Finishing the euchromatic sequence of the human genome , 2004 .

[33]  A. Brivanlou,et al.  Microarray-based analysis of early development in Xenopus laevis. , 2001, Developmental biology.

[34]  E. Mardis,et al.  Generation and analysis of 280,000 human expressed sequence tags. , 1996, Genome research.

[35]  B. Sammut,et al.  The fate of duplicated major histocompatibility complex class Ia genes in a dodecaploid amphibian, Xenopus ruwenzoriensis , 2002, European journal of immunology.

[36]  Michael Beckstette,et al.  PoSSuMsearch: Fast and Sensitive Matching of Position Specific Scoring Matrices using Enhanced Suffix Arrays , 2004, German Conference on Bioinformatics.

[37]  Ji-Ping Z. Wang,et al.  EST clustering error evaluation and correction , 2004, Bioinform..

[38]  G. C. Roberts,et al.  Alternative splicing: combinatorial output from the genome. , 2002, Current opinion in chemical biology.

[39]  R. Maas,et al.  Genomic structure, evolutionary conservation and aniridia mutations in the human PAX6 gene , 1992, Nature Genetics.

[40]  K. O. Elliston,et al.  Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. , 1996, Genome research.

[41]  P. Deloukas,et al.  A Gene Map of the Human Genome , 1996, Science.

[42]  Steven Salzberg,et al.  Finding Genes in DNA with a Hidden Markov Model , 1997, J. Comput. Biol..

[43]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[44]  Gord Fishell,et al.  The role of notch in promoting glial and neural stem cell fates. , 2002, Annual review of neuroscience.

[45]  J. Jurka Repbase update: a database and an electronic journal of repetitive elements. , 2000, Trends in genetics : TIG.

[46]  Ken W. Y. Cho,et al.  A Xenopus DNA microarray approach to identify novel direct BMP target genes involved in early embryonic development , 2005, Developmental dynamics : an official publication of the American Association of Anatomists.

[47]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[48]  J. Mattick Non‐coding RNAs: the architects of eukaryotic complexity , 2001, EMBO reports.

[49]  R. Strausberg,et al.  The Cancer Genome Anatomy Project: Online Resources to Reveal the Molecular Signatures of Cancer , 2002, Cancer investigation.

[50]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[51]  Michael Q. Zhang Computational prediction of eukaryotic protein-coding genes , 2002, Nature Reviews Genetics.

[52]  G. Schuler Pieces of the puzzle: expressed sequence tags and the catalog of human genes , 1997, Journal of Molecular Medicine.

[53]  Daniel Lee,et al.  The TIGR Gene Indices: analysis of gene transcript sequences in highly sampled eukaryotic species , 2001, Nucleic Acids Res..

[54]  S. Altschul,et al.  SAGEmap: a public gene expression resource. , 2000, Genome research.

[55]  Enno Ohlebusch,et al.  The Enhanced Suffix Array and Its Applications to Genome Analysis , 2002, WABI.

[56]  M. Borodovsky,et al.  Heuristic approach to deriving models for gene finding. , 1999, Nucleic acids research.

[57]  Gregory D. Schuler,et al.  ESTablishing a human transcript map , 1995, Nature Genetics.

[58]  Kate Werry,et al.  Exposure to the herbicide acetochlor alters thyroid hormone-dependent gene expression and metamorphosis in Xenopus Laevis. , 2002, Environmental health perspectives.

[59]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[60]  Mika Yoshida Intermediate filament proteins define different glial subpopulations , 2001, Journal of neuroscience research.

[61]  Robert Miller,et al.  STACK: Sequence Tag Alignment and Consensus Knowledgebase , 2001, Nucleic Acids Res..

[62]  Kenneth H Buetow,et al.  An anatomy of normal and malignant gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[64]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[65]  Ken W. Y. Cho,et al.  Global analysis of RAR‐responsive genes in the Xenopus neurula using cDNA microarrays , 2005, Developmental dynamics : an official publication of the American Association of Anatomists.

[66]  C. Botta,et al.  Integrin function and regulation in development. , 2000, The International journal of developmental biology.

[67]  Erik L. L. Sonnhammer,et al.  A Hidden Markov Model for Predicting Transmembrane Helices in Protein Sequences , 1998, ISMB.

[68]  Roland Eils,et al.  Reliability of gene expression ratios for cDNA microarrays in multiconditional experiments with a reference design. , 2004, Nucleic acids research.

[69]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[70]  T. Wetter,et al.  Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. , 2004, Genome research.

[71]  Rahmi Oklu,et al.  The latent transforming growth factor binding protein (LTBP) family , 2000 .

[72]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[73]  A. Nekrutenko Reconciling the numbers: ESTs versus protein-coding genes. , 2004, Molecular biology and evolution.

[74]  Jodie J. Yin,et al.  A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes , 2004, Genome Biology.

[75]  R. Hesketh,et al.  The latent transforming growth factor beta binding protein (LTBP) family. , 2000, The Biochemical journal.

[76]  A. Graner,et al.  Snipping polymorphisms from large EST collections in barley (Hordeum vulgare L.) , 2003, Molecular Genetics and Genomics.

[77]  Ken W. Y. Cho,et al.  Identification of neural genes using Xenopus DNA microarrays , 2005, Developmental dynamics : an official publication of the American Association of Anatomists.

[78]  D. Davison,et al.  d2_cluster: a validated method for clustering EST and full-length cDNAsequences. , 1999, Genome research.

[79]  L. Wagner,et al.  21. UniGene: A Unified View of the Transcriptome , 2003 .

[80]  N. Ueno,et al.  Screening of FGF target genes in Xenopus by microarray: temporal dissection of the signalling pathway using a chemical inhibitor , 2004, Genes to cells : devoted to molecular & cellular mechanisms.

[81]  E. Birney,et al.  The International Protein Index: An integrated database for proteomics experiments , 2004, Proteomics.

[82]  E. D. De Robertis,et al.  The Xenopus XIHbox 6 homeo protein, a marker of posterior neural induction, is expressed in proliferating neurons. , 1990, Development.

[83]  J. Slack,et al.  Regulation of Hox gene expression and posterior development by the Xenopus caudal homologue Xcad3 , 1998, The EMBO journal.

[84]  Jan Krüger,et al.  e2g: an interactive web-based server for efficiently mapping large EST and cDNA sets to genomic sequences , 2004, Nucleic Acids Res..

[85]  K. White,et al.  Analysis of the eye developmental pathway in Drosophila using DNA microarrays , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[86]  Erez Y. Levanon,et al.  Widespread occurrence of antisense transcription in the human genome , 2003, Nature Biotechnology.

[87]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[88]  A. Brivanlou,et al.  Gene profiling during neural induction in Xenopus laevis: regulation of BMP signaling by post-transcriptional mechanisms and TAB3, a novel TAK1-binding protein , 2002, Development.

[89]  Enno Ohlebusch,et al.  Optimal Exact Strring Matching Based on Suffix Arrays , 2002, SPIRE.

[90]  D. Lipscombe,et al.  Functional diversity in neuronal voltage-gated calcium channels by alternative splicing of Cavα1 , 2002, Molecular Neurobiology.

[91]  M. Gelfand,et al.  Frequent alternative splicing of human genes. , 1999, Genome research.

[92]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[93]  T. Cooper,et al.  Finding signals that regulate alternative splicing in the post-genomic era , 2002, Genome Biology.

[94]  A. Rafalski,et al.  High-throughput identification, database storage and analysis of SNPs in EST sequences. , 2001, Genome informatics. International Conference on Genome Informatics.

[95]  R D Klausner,et al.  The mammalian gene collection. , 1999, Science.