Protein structure prediction and analysis as a tool for functional genomics.

Bioinformatic analyses of whole genome sequences highlight the problem of identifying the biochemical and cellular functions of the many gene products that are at present uncharacterised. Determination of their three-dimensional structures, either experimentally or by prediction, provides a powerful tool to address function, since it is at this level that biological activity is expressed. Here, we discuss the current approaches to protein structure prediction from sequence data, including the ab initio prediction of new folds, methods of fold recognition and comparative modelling based on homology. The value and limitations of such models are also explored. A major factor for the future will be the growth of the database of experimentally determined protein structures, through structural genomics projects. The prospects for this approach are also discussed, together with our experience in a pilot structural genomics project focused on proteins from Mycobacterium tuberculosis, the cause of tuberculosis (TB).

[1]  Sung-Hou Kim,et al.  Overview of structural genomics: from structure to function. , 2003, Current opinion in chemical biology.

[2]  J. Greer Comparative model-building of the mammalian serine proteases. , 1981, Journal of molecular biology.

[3]  I. Crawford,et al.  Prediction of secondary structure by evolutionary comparison: Application to the α subunit of tryptophan synthase , 1987, Proteins.

[4]  R. L. Baldwin,et al.  How does protein folding get started? , 1989, Trends in biochemical sciences.

[5]  A. Sali 100,000 protein structures for the biologist , 1998, Nature Structural Biology.

[6]  Roland L. Dunbrack,et al.  Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. , 1997, Journal of molecular biology.

[7]  V. Arcus,et al.  Crystal Structure of a Putative Methyltransferase from Mycobacterium tuberculosis: Misannotation of a Genome Clarified by Protein Structural Analysis , 2003, Journal of bacteriology.

[8]  Sung-Hou Kim Shining a light on structural genomics , 1998, Nature Structural Biology.

[9]  P. Y. Chou,et al.  Prediction of the secondary structure of proteins from their amino acid sequence. , 2006 .

[10]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[11]  John Moult,et al.  A unifold, mesofold, and superfold model of protein fold use , 2002, Proteins.

[12]  O. Galzitskaya,et al.  Prediction of protein domain boundaries from sequence alone , 2003, Protein science : a publication of the Protein Society.

[13]  E V Koonin,et al.  Estimating the number of protein folds and families from complete genome data. , 2000, Journal of molecular biology.

[14]  B. Barrell,et al.  Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence , 1998, Nature.

[15]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[16]  Ruben Recabarren,et al.  Estimating the total number of protein folds , 1999, Proteins.

[17]  T. Hubbard,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2003, Proteins.

[18]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[19]  David T. Jones,et al.  Rapid protein domain assignment from amino acid sequence using predicted secondary structure , 2002, Protein science : a publication of the Protein Society.

[20]  J. Moult,et al.  Biological function made crystal clear - annotation of hypothetical proteins via structural genomics. , 2000, Current opinion in biotechnology.

[21]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[22]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[23]  T L Blundell,et al.  FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. , 2001, Journal of molecular biology.

[24]  A G Murzin,et al.  CASP2 knowledge‐based approach to distant homology recognition and fold prediction in CASP4 , 2001, Proteins.

[25]  M. Grütter,et al.  Structural genomics: opportunities and challenges. , 2001, Current opinion in chemical biology.

[26]  J. Thornton,et al.  Factors limiting the performance of prediction‐based fold recognition methods , 2008, Protein science : a publication of the Protein Society.

[27]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.

[28]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[29]  H. Chan Protein folding: Matching speed and locality , 1998, Nature.

[30]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[31]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[32]  Chris Sander,et al.  Completeness in structural genomics , 2001, Nature Structural Biology.

[33]  D. Baker,et al.  Protein structure prediction in 2002. , 2002, Current opinion in structural biology.

[34]  R Leplae,et al.  Analysis and assessment of comparative modeling predictions in CASP4 , 2001, Proteins.

[35]  Claudine Médigue,et al.  Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. , 2002, Microbiology.

[36]  Paul W. Fitzjohn,et al.  Comparative modelling: an essential methodology for protein structure prediction in the post-genomic era. , 2002, Applied bioinformatics.

[37]  C. Chothia,et al.  Determination of protein function, evolution and interactions by structural genomics. , 2001, Current opinion in structural biology.

[38]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[39]  J. Newman,et al.  Class‐directed structure determination: Foundation for a protein structure initiative , 1998, Protein science : a publication of the Protein Society.

[40]  J. Jung,et al.  Protein structure prediction. , 2001, Current opinion in chemical biology.

[41]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[42]  A. Sali,et al.  Structural genomics: beyond the Human Genome Project , 1999, Nature Genetics.

[43]  J. Musser,et al.  Crystal structure of the zymogen form of the group A Streptococcus virulence factor SpeB: an integrin-binding cysteine protease. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Structure of HisF, a histidine biosynthetic protein from Pyrobaculum aerophilum. , 2001, Acta crystallographica. Section D, Biological crystallography.