PipeAlign: a new toolkit for protein family analysis

PipeAlign is a protein family analysis tool integrating a five step process ranging from the search for sequence homologues in protein and 3D structure databases to the definition of the hierarchical relationships within and between subfamilies. The complete, automatic pipeline takes a single sequence or a set of sequences as input and constructs a high-quality, validated MACS (multiple alignment of complete sequences) in which sequences are clustered into potential functional subgroups. For the more experienced user, the PipeAlign server also provides numerous options to run only a part of the analysis, with the possibility to modify the default parameters of each software module. For example, the user can choose to enter an existing multiple sequence alignment for refinement, validation and subsequent clustering of the sequences. The aim is to provide an interactive workbench for the validation, integration and presentation of a protein family, not only at the sequence level, but also at the structural and functional levels. PipeAlign is available at http://igbmc.u-strasbg.fr/PipeAlign/.

[1]  N. Wicker,et al.  Secator: a program for inferring protein subfamilies from phylogenetic trees. , 2001, Molecular biology and evolution.

[2]  J L Sussman,et al.  Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. , 1998, Acta crystallographica. Section D, Biological crystallography.

[3]  Olivier Poch,et al.  RASCAL: Rapid Scanning and Correction of Multiple Sequence Alignments , 2003, Bioinform..

[4]  Alex Bateman,et al.  InterPro: An Integrated Documentation Resource for Protein Families, Domains and Functional Sites , 2002, Briefings Bioinform..

[5]  J. D. Thompson,et al.  Towards a reliable objective function for multiple sequence alignments. , 2001, Journal of molecular biology.

[6]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[7]  J. Weissenbach,et al.  An integrated analysis of the genome of the hyperthermophilic archaeon Pyrococcus abyssi , 2003, Molecular microbiology.

[8]  J. Thompson,et al.  DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches. , 2000, Nucleic acids research.

[9]  Olivier Poch,et al.  Ballast: Blast post-processing based on locally conserved segments , 2000, Bioinform..

[10]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[11]  S. Duprat,et al.  Genome evolution at the genus level: comparison of three complete genomes of hyperthermophilic archaea. , 2001, Genome research.

[12]  J. D. Thompson,et al.  Multiple alignment of complete sequences (MACS) in the post-genomic era. , 2001, Gene.

[13]  N. Wicker,et al.  Density of points clustering, application to transcriptomic data analysis. , 2002, Nucleic acids research.

[14]  Olivier Poch,et al.  Comparative analysis of ribosomal proteins in complete genomes: an example of reductive evolution at the domain scale. , 2002, Nucleic acids research.