A ‘periodic table’ for protein structures

Current structural genomics programs aim systematically to determine the structures of all proteins coded in both human and other genomes, providing a complete picture of the number and variety of protein structures that exist. In the past, estimates have been made on the basis of the incomplete sample of structures currently known. These estimates have varied greatly (between 1,000 and 10,000; see for example refs 1 and 2), partly because of limited sample size but also owing to the difficulties of distinguishing one structure from another. This distinction is usually topological, based on the fold of the protein; however, in strict topological terms (neglecting to consider intra-chain cross-links), protein chains are open strings and hence are all identical. To avoid this trivial result, topologies are determined by considering secondary links in the form of intra-chain hydrogen bonds (secondary structure) and tertiary links formed by the packing of secondary structures. However, small additions to or loss of structure can make large changes to these perceived topologies and such subjective solutions are neither robust nor amenable to automation. Here I formalize both secondary and tertiary links to allow the rigorous and automatic definition of protein topology.

[1]  William R. Taylor,et al.  Analysis and prediction of protein β-sheet structures by a combinatorial approach , 1980, Nature.

[2]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[3]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[4]  A V Finkelstein,et al.  The classification and origins of protein folding patterns. , 1990, Annual review of biochemistry.

[5]  Chris Sander,et al.  Dali/FSSP classification of three-dimensional protein folds , 1997, Nucleic Acids Res..

[6]  W R Taylor,et al.  Defining linear segments in protein structure. , 2001, Journal of molecular biology.

[7]  Vishva M. Dixit,et al.  RAIDD is a new 'death' adaptor molecule , 1997, Nature.

[8]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[9]  Janet M. Thornton,et al.  Protein domain superfolds and superfamilies , 1994 .

[10]  William R. Taylor,et al.  Analysis and prediction of the packing of α-helices against a β-sheet in the tertiary structure of globular proteins , 1982 .

[11]  C. Chothia,et al.  New folds for all-β proteins , 1993 .

[12]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[13]  W R Taylor Searching for the ideal forms of proteins. , 2000, Biochemical Society transactions.

[14]  A. Lesk,et al.  Principles determining the structure of beta-sheet barrels in proteins. I. A theoretical analysis. , 1994, Journal of molecular biology.

[15]  Alexey G. Murzin,et al.  General architecture of the α-helical globule , 1988 .

[16]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[17]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[18]  O. Ptitsyn,et al.  Why do globular proteins fit the limited set of folding patterns? , 1987, Progress in biophysics and molecular biology.

[19]  D. T. Jones,et al.  A method for alpha-helical integral membrane protein fold prediction. , 1994, Proteins.

[20]  W R Taylor,et al.  Protein structural domain identification. , 1999, Protein engineering.

[21]  William R Taylor Protein Structure Comparison Using Bipartite Graph Matching and Its Application to Protein Structure Classification * , 2002, Molecular & Cellular Proteomics.

[22]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.