A unifold, mesofold, and superfold model of protein fold use

As more and more protein structures are determined, there is increasing interest in the question of how many different folds have been used in biology. The history of the rate of discovery of new folds and the distribution of sequence families among known folds provide a means of estimating the underlying distribution of fold use. Previous models exploiting these data have led to rather different conclusions on the total number of folds. We present a new model, based on the notion that the folds used in biology fall naturally into three classes: unifolds, that is, folds found only in a single narrow sequence family; mesofolds, found in an intermediate number of families; and the previously noted superfolds, found in many protein families. We show that this model fits the available data well and has predicted the development of SCOP over the past 2 years. The principle implications of the model are as follows: (1) The vast majority of folds will be found in only a single sequence family; (2) the total number of folds is at least 10,000; and (3) 80% of sequence families have one of about 400 folds, most of which are already known. Proteins 2002;46:61–71. © 2001 Wiley‐Liss, Inc.

[1]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[2]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[3]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[4]  Sean R. Eddy,et al.  Maximum Discrimination Hidden Markov Models of Sequence Consensus , 1995, J. Comput. Biol..

[5]  E V Koonin,et al.  Estimating the number of protein folds and families from complete genome data. , 2000, Journal of molecular biology.

[6]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.

[7]  Ruben Recabarren,et al.  Estimating the total number of protein folds , 1999, Proteins.

[8]  C DeLisi,et al.  Estimating the number of protein folds. , 1998, Journal of molecular biology.

[9]  R A Goldstein,et al.  Why are some proteins structures so common? , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[10]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.