XDOM, a graphical tool to analyse domain arrangements in any set of protein sequences

MOTIVATION To extract the maximum possible information from a set of protein sequences, its modular organization must be known and clearly displayed. This is important both for structural and functional analysis. RESULTS This paper presents an algorithm and a graphical interface called XDOM which performs a systematic analysis of the modular organization of any set of protein sequences. The algorithm is an automatic method to identify putative domains from sequence comparisons. The graphical tool displays the proteins as a set of linked boxes, corresponding to its domains. The method has been tested on a family of bacterial proteins and on whole genomes. It is currently applied to the complete SWISS-PROT database to build the PRODOM database. AVAILABILITY XDOM is available free of charge by anonymous ftp:¿¿ftp://ftp.toulouse.inra.fr/pub/xdom¿ ¿. The ProDom database can be consulted at ¿¿http://protein.toulouse.inra.fr/prodom.html¿¿.

[1]  S. Altschul,et al.  Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[2]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[3]  F. Corpet,et al.  Graphical interface for ProDom domain families. , 1996, Trends in biochemical sciences.

[4]  J. Reizer,et al.  The bacterial phosphotransferase system: new frontiers 30 years later , 1994, Molecular microbiology.

[5]  J. Claverie,et al.  CHAPTER THIRTY-SIX – Large-scale Sequence Analysis , 1994 .

[6]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[7]  E. Sonnhammer,et al.  Modular arrangement of proteins as inferred from analysis of homology , 1994, Protein science : a publication of the Protein Society.

[8]  Jean-Michel Claverie,et al.  Information Enhancement Methods for Large Scale Sequence Analysis , 1993, Comput. Chem..

[9]  S Henikoff,et al.  Connecting protein family resources using the proWeb network. , 1996, Trends in biochemical sciences.

[10]  S. Henikoff,et al.  Finding protein similarities with nucleotide sequence databases. , 1990, Methods in enzymology.

[11]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[12]  P. Bork Shuffled domains in extracellular proteins , 1991, FEBS letters.

[13]  Sayaka,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[14]  C. Chothia,et al.  Volume changes in protein evolution. , 1994, Journal of molecular biology.

[15]  Y. Nakamura,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions (supplement). , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[16]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[17]  H. Hilbert,et al.  Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. , 1996, Nucleic acids research.

[18]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[19]  G D Schuler,et al.  A workbench for multiple alignment construction and analysis , 1991, Proteins.

[20]  F. Corpet Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.

[21]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[22]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[23]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[24]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[26]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[27]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.