A re-estimation for the total numbers of protein folds and superfamilies.

The issue of the number of protein folds is steeped in controversy despite its significance for understanding evolution and predicting protein structure from amino acid sequence. Using various assumptions, several research groups have tackled this problem with very different results. In the present study, a more rigorous statistical approach is used to address this question. From three different data sets, the total number of protein folds is estimated to be about 650. A detailed theoretical analysis suggests that (i) a random sample of non-transmembrane protein families has been selected for crystallization and structural determination, (ii) except for about 40 folds, most protein folds occurring in nature contain about the same number of different protein families. With the estimation of the total number of protein folds, the number of naturally occurring superfamilies can then be estimated as 1150.