ProtSweep, 2Dsweep and DomainSweep: protein analysis suite at DKFZ

The wealth of transcript information that has been made publicly available in recent years has led to large pools of individual web sites offering access to bioinformatics software. However, finding out which services exist, what they can or cannot do, how to use them and how to feed results from one service to the next one in the right format can be very time and resource consuming, especially for non-experts. Automating this task, we present a suite of protein annotation pipelines (tasks) developed at the German Cancer Research Centre (DKFZ) oriented to protein annotation by homology (ProtSweep), by domain analysis (DomainSweep), and by secondary structure elements (2Dsweep). The aim of these tasks is to perform an exhaustive structural and functional analysis employing a wide variety of methods in combination with the most updated public databases. The three servers are available for academic users at the HUSAR open server http://genius.embnet.dkfz-heidelberg.de/menu/biounit/open-husar/

[1]  Peter Ernst,et al.  A task framework for the web interface W2H , 2003, Bioinform..

[2]  Jérôme Gouzy,et al.  ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons , 2000, Nucleic Acids Res..

[3]  Kanako O. Koyanagi,et al.  Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones , 2004, PLoS Biology.

[4]  Birgit Eisenhaber,et al.  TM or not TM: transmembrane protein prediction with low false positive rate using DAS-TMfilter , 2004, Bioinform..

[5]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[6]  P Argos,et al.  Prediction of transmembrane segments in proteins utilising multiple sequence alignments. , 1994, Journal of molecular biology.

[7]  Erik L. L. Sonnhammer,et al.  A Hidden Markov Model for Predicting Transmembrane Helices in Protein Sequences , 1998, ISMB.

[8]  Robert Fredriksson,et al.  Comparison of the current RefSeq, Ensembl and EST databases for counting genes and gene discovery , 2005, FEBS letters.

[9]  Shmuel Pietrokovski,et al.  Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations , 1999, Bioinform..

[10]  R. King,et al.  Identification and application of the concepts important for accurate and reliable protein secondary structure prediction , 1996, Protein science : a publication of the Protein Society.

[11]  Peter Ernst,et al.  W2H: WWW interface to the GCG sequence analysis package , 1998, Bioinform..

[12]  Peer Bork,et al.  SMART: identification and annotation of domains from signalling and extracellular protein sequences , 1999, Nucleic Acids Res..

[13]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[14]  Tim J. P. Hubbard,et al.  SCOP database in 2004: refinements integrate structure and sequence family data , 2004, Nucleic Acids Res..

[15]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[16]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[17]  M Ouali,et al.  Cascaded multiple classifiers for secondary structure prediction , 2000, Protein science : a publication of the Protein Society.

[18]  Stefan Wiemann,et al.  LIFEdb: a database for functional genomics experiments integrating information from external sources, and serving as a sample tracking system , 2004, Nucleic Acids Res..

[19]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[20]  Oliver Hofmann,et al.  The LIFEdb database in 2006 , 2006, Nucleic Acids Res..

[21]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[22]  Terri K. Attwood,et al.  PRINTS and its automatic supplement, prePRINTS , 2003, Nucleic Acids Res..

[23]  Osamu Ohara,et al.  HUGE: a database for human large proteins identified by Kazusa cDNA sequencing project , 1999, Nucleic Acids Res..

[24]  Cathy H. Wu,et al.  Protein family classification and functional annotation , 2003, Comput. Biol. Chem..

[25]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[26]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[27]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[28]  W. Kabsch,et al.  Identical pentapeptides with different backbones , 1985, Nature.

[29]  H. Mewes,et al.  Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs. , 2001, Genome research.