ProtoNet: hierarchical classification of the protein space

The ProtoNet site provides an automatic hierarchical clustering of the SWISS-PROT protein database. The clustering is based on an all-against-all BLAST similarity search. The similarities' E-score is used to perform a continuous bottom-up clustering process by applying alternative rules for merging clusters. The outcome of this clustering process is a classification of the input proteins into a hierarchy of clusters of varying degrees of granularity. ProtoNet (version 1.3) is accessible in the form of an interactive web site at http://www.protonet.cs.huji.ac.il. ProtoNet provides navigation tools for monitoring the clustering process with a vertical and horizontal view. Each cluster at any level of the hierarchy is assigned with a statistical index, indicating the level of purity based on biological keywords such as those provided by SWISS-PROT and InterPro. ProtoNet can be used for function prediction, for defining superfamilies and subfamilies and for large-scale protein annotation purposes.

[1]  Nathan Linial,et al.  ProtoMap: automatic classification of protein sequences and hierarchy of protein families , 2000, Nucleic Acids Res..

[2]  Martin Vingron,et al.  The SYSTERS protein sequence cluster set , 2000, Nucleic Acids Res..

[3]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[4]  Rolf Apweiler,et al.  CluSTr: a database of clusters of SWISS-PROT+TrEMBL proteins , 2001, Nucleic Acids Res..

[5]  Jérôme Gouzy,et al.  ProDom and ProDom-CG: tools for protein domain analysis and whole genome comparisons , 2000, Nucleic Acids Res..

[6]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[7]  N Linial,et al.  ProtoMap: Automatic classification of protein sequences, a hierarchy of protein families, and local maps of the protein space , 1999, Proteins.

[8]  Amos Bairoch,et al.  The PROSITE database, its status in 2002 , 2002, Nucleic Acids Res..

[9]  M. Linial,et al.  Estimating the probability for a protein to have a new fold: A statistical computational model. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  R Apweiler,et al.  Clustering and analysis of protein families. , 2001, Current opinion in structural biology.

[11]  Ori Sasson,et al.  The metric space of proteins-comparative study of clustering algorithms , 2002, ISMB.

[12]  Peer Bork,et al.  SMART: a web-based tool for the study of genetically mobile domains , 2000, Nucleic Acids Res..

[13]  Elon Portugaly,et al.  Selecting targets for structural determination by navigating in a graph of protein families , 2002, Bioinform..

[14]  Liisa Holm,et al.  Picasso: generating a covering set of protein family profiles , 2001, Bioinform..

[15]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[16]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[17]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[18]  Terri K. Attwood,et al.  PRINTS-S: the database formerly known as PRINTS , 2000, Nucleic Acids Res..