Automated structure prediction of weakly homologous proteins on a genomic scale.

We have developed TASSER, a hierarchical approach to protein structure prediction that consists of template identification by threading, followed by tertiary structure assembly via the rearrangement of continuous template fragments guided by an optimized C(alpha) and side-chain-based potential driven by threading-based, predicted tertiary restraints. TASSER was applied to a comprehensive benchmark set of 1,489 medium-sized proteins in the Protein Data Bank. With homologues excluded, in 927 cases, the templates identified by our threading algorithm PROSPECTOR_3 have a rms deviation from native <6.5 A with approximately 80% alignment coverage. After template reassembly, this number increases to 1,172. This shows significant and systematic improvement of the final models with respect to the initial template alignments. Furthermore, significant improvements in loop modeling are demonstrated. We then apply TASSER to the 1,360 medium-sized ORFs in the Escherichia coli genome; approximately 920 can be predicted with high accuracy based on confidence criteria established in the Protein Data Bank benchmark. These results from our unprecedented comprehensive folding benchmark on all protein categories provide a reliable basis for the application of TASSER to structural genomics, especially to proteins of low sequence identity to solved protein structures.

[1]  F. Young Biochemistry , 1955, The Indian Medical Gazette.

[2]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[3]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[4]  W R Taylor,et al.  A model recognition approach to the prediction of all-helical membrane protein structure and topology. , 1994, Biochemistry.

[5]  D. Fischer,et al.  Assigning folds to the proteins encoded by the genome of Mycoplasma genitalium. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[6]  R D Appel,et al.  Large‐scale protein modelling and integration with the SWISS‐PROT and SWISS‐2DPAGE databases: The example of Escherichia coli , 1997, Electrophoresis.

[7]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[8]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[9]  David T. Jones Do transmembrane protein superfolds exist? , 1998, FEBS letters.

[10]  A. Sali,et al.  Large-scale protein structure modeling of the Saccharomyces cerevisiae genome. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  J. Skolnick,et al.  What is the probability of a chance prediction of a protein structure with an rmsd of 6 A? , 1998, Folding & design.

[12]  A. Liwo,et al.  Protein structure prediction by global optimization of a potential energy function. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[13]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[14]  Jacquelyn S. Fetrow,et al.  Structural genomics and its importance for gene function analysis , 2000, Nature Biotechnology.

[15]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[16]  A. Sali,et al.  Modeling of loops in protein structures , 2000, Protein science : a publication of the Protein Society.

[17]  M. Saraste,et al.  FEBS Lett , 2000 .

[18]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[19]  D. Baker,et al.  Prospects for ab initio protein structural genomics. , 2001, Journal of molecular biology.

[20]  Benjamin A. Shoemaker,et al.  CDD: a database of conserved domain alignments with links to domain three-dimensional structure , 2002, Nucleic Acids Res..

[21]  Takeshi Kawabata,et al.  GTOP: a database of protein structures predicted from genome sequences , 2002, Nucleic Acids Res..

[22]  Daisuke Kihara,et al.  Ab initio protein structure prediction on a genomic scale: Application to the Mycoplasma genitalium genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23]  J. Skolnick,et al.  Local energy landscape flattening: Parallel hyperbolic Monte Carlo sampling of protein folding , 2002, Proteins.

[24]  Mark Gerstein,et al.  Structural genomics analysis: Characteristics of atypical, common, and horizontally transferred folds , 2002, Proteins.

[25]  J. Skolnick,et al.  The PDB is a covering set of small protein structures. , 2003, Journal of molecular biology.

[26]  J. Skolnick,et al.  TOUCHSTONE II: a new approach to ab initio protein structure prediction. , 2003, Biophysical journal.

[27]  Dmitrij Frishman,et al.  The PEDANT genome database , 2003, Nucleic Acids Res..

[28]  大房 健 基礎講座 電気泳動(Electrophoresis) , 2005 .