Targeting novel folds for structural genomics

The ultimate goal of structural genomics is to obtain the structure of each protein coded by each gene within a genome to determine gene function. Because of cost and time limitations, it remains impractical to solve the structure for every gene product experimentally. Up to a point, reasonably accurate three‐dimensional structures can be deduced for proteins with homologous sequences by using comparative modeling. Beyond this, fold recognition or threading methods can be used for proteins showing little homology to any known fold, although this is relatively time‐consuming and limited by the library of template folds currently available. Therefore, it is appropriate to develop methods that can increase our knowledge base, expanding our fold libraries by earmarking potentially “novel” folds for experimental structure determination. How can we sift through proteomic data rapidly and yet reliably identify novel folds as targets for structural genomics? We have analyzed a number of simple methods that discriminate between “novel” and “known” folds. We propose that simple alignments of secondary structure elements using predicted secondary structure could potentially be a more selective method than both a simple fold recognition method (GenTHREADER) and standard sequence alignment at finding novel folds when sequences show no detectable homology to proteins with known structures. Proteins 2002;48:44–52. © 2002 Wiley‐Liss, Inc.

[1]  Z. X. Wang,et al.  How many fold types of protein are there in nature? , 1996, Proteins.

[2]  Steven E. Brenner,et al.  Target selection for structural genomics , 2000, Nature Structural Biology.

[3]  K Karplus,et al.  Predicting protein structure using only sequence information , 1999, Proteins.

[4]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[5]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[6]  P Argos,et al.  Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class , 1996, Proteins.

[7]  M Kann,et al.  Optimization of a new score function for the detection of remote homologs , 2000, Proteins.

[8]  Zheng Yuan,et al.  How good is prediction of protein structural class by the component‐coupled method? , 2000, Proteins.

[9]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[10]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[11]  George D. Rose,et al.  A protein taxonomy based on secondary structure , 1999, Nature Structural Biology.

[12]  B. Rost,et al.  A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment , 1999, Proteins.

[13]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[14]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[15]  Liam J. McGuffin,et al.  What are the baselines for protein fold recognition? , 2001, Bioinform..

[16]  M. Linial,et al.  Estimating the probability for a protein to have a new fold: A statistical computational model. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[18]  E V Koonin,et al.  Estimating the number of protein folds and families from complete genome data. , 2000, Journal of molecular biology.

[19]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.

[20]  A. Sali 100,000 protein structures for the biologist , 1998, Nature Structural Biology.

[21]  H. M. Einspahr,et al.  Crystallization of purified recombinant human interleukin‐1β , 1988 .

[22]  Robert Huber,et al.  Ta6Br122+, a tool for phase determination of large biological assemblies by X-ray crystallography , 1997 .

[23]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[24]  C. Orengo,et al.  Analysis and assessment of ab initio three‐dimensional prediction, secondary structure, and contacts prediction , 1999, Proteins.

[25]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.