SYSTERS, GeneNest, SpliceNest: exploring sequence space from genome to protein

We have integrated the protein families from SYSTERS and the expressed sequence tag (EST) clusters from our database GeneNest with SpliceNest, a new database mapping EST contigs into genomic DNA. The SYSTERS protein sequence cluster set provides an automatically generated classification of all sequences of the SWISS-PROT, TrEMBL and PIR databases into disjoint protein family and superfamily clusters. GeneNest is a database and software package for producing and visualizing gene indices from ESTs and mRNAs. Currently, the database comprises gene indices of human, mouse, Arabidopsis thaliana and zebrafish. SpliceNest is a web-based graphical tool to explore gene structure, including alternative splicing, based on a mapping of the EST consensus sequences from GeneNest to the complete human genome. The integration of SYSTERS, GeneNest and SpliceNest into one framework now permits an overall exploration of the whole sequence space covering protein, mRNA and EST sequences, as well as genomic DNA. The databases are available for querying and browsing at http://cmb.molgen.mpg.de.

[1]  G. Schuler Pieces of the puzzle: expressed sequence tags and the catalog of human genes , 1997, Journal of Molecular Medicine.

[2]  Pierre Nicodème,et al.  SSMAL: similarity searching with alignment graphs , 1998, Bioinform..

[3]  Martin Vingron,et al.  SpliceNest: visualizing gene structure and alternative splicing based on EST clusters , 2002 .

[4]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2004, Nucleic acids research.

[5]  Peter B. McGarvey,et al.  Protein Information Resource: a community resource for expert annotation of protein data , 2001, Nucleic Acids Res..

[6]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[7]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[8]  M Vingron,et al.  GeneNest: automated generation and visualization of gene indices. , 2000, Trends in genetics : TIG.

[9]  Martin Vingron,et al.  A set-theoretic approach to database searching and clustering , 1998, Bioinform..

[10]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[11]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[12]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[13]  Chris Sander,et al.  MView: a web-compatible database search or multiple alignment viewer , 1998, Bioinform..