Methodologies for target selection in structural genomics.

As the number of complete genomes that have been sequenced keeps growing, unknown areas of the protein space are revealed and new horizons open up. Most of this information will be fully appreciated only when the structural information about the encoded proteins becomes available. The goal of structural genomics is to direct large-scale efforts of protein structure determination, so as to increase the impact of these efforts. This review focuses on current approaches in structural genomics aimed at selecting representative proteins as targets for structure determination. We will discuss the concept of representative structures/folds, the current methodologies for identifying those proteins, and computational techniques for identifying proteins which are expected to adopt new structural folds.

[1]  C Sander,et al.  New structure--novel fold? , 1997, Structure.

[2]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[3]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[4]  M. Levitt Accurate modeling of protein conformation by automatic segment matching. , 1992, Journal of molecular biology.

[5]  Nathan Linial,et al.  ProtoMap: automatic classification of protein sequences and hierarchy of protein families , 2000, Nucleic Acids Res..

[6]  C Sander,et al.  An evolutionary treasure: unification of a broad set of amidohydrolases related to urease , 1997, Proteins.

[7]  Tim J. P. Hubbard,et al.  SCOP: a Structural Classification of Proteins database , 1999, Nucleic Acids Res..

[8]  Mckusick Va Genomics: structural and functional studies of genomes. , 1997 .

[9]  S. Jones,et al.  Prediction of protein-protein interaction sites using patch analysis. , 1997, Journal of molecular biology.

[10]  T. P. Flores,et al.  Comparison of conformational characteristics in structurally similar protein pairs , 1993, Protein science : a publication of the Protein Society.

[11]  S. Kim,et al.  Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  S E Brenner,et al.  Distribution of protein folds in the three superkingdoms of life. , 1999, Genome research.

[13]  A G Murzin,et al.  Structural classification of proteins: new superfamilies. , 1996, Current opinion in structural biology.

[14]  P C Babbitt,et al.  Evolution of an enzyme active site: the structure of a new crystal form of muconate lactonizing enzyme compared with mandelate racemase and enolase. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[15]  M. Levitt,et al.  A unified statistical framework for sequence comparison and structure comparison. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[17]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[18]  S. Wodak,et al.  Protein structure prediction by threading methods: Evaluation of current techniques , 1995, Proteins.

[19]  O. Ptitsyn,et al.  Why do globular proteins fit the limited set of folding patterns? , 1987, Progress in biophysics and molecular biology.

[20]  P. Green,et al.  Ancient conserved regions in new gene sequences and the protein databases. , 1993, Science.

[21]  A Danchin,et al.  From protein sequence to function. , 1999, Current opinion in structural biology.

[22]  W A Hendrickson,et al.  Selenomethionyl proteins produced for analysis by multiwavelength anomalous diffraction (MAD): a vehicle for direct determination of three‐dimensional structure. , 1990, The EMBO journal.

[23]  C. Chothia,et al.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Robert D. Finn,et al.  Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins , 1999, Nucleic Acids Res..

[25]  P. Bork,et al.  Predicting functions from protein sequences—where are the bottlenecks? , 1998, Nature Genetics.

[26]  Golan Yona,et al.  A unified sequence-structure classification of protein sequences: combining sequence and structure in a map of the protein space , 2000, RECOMB '00.

[27]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[28]  M J Sternberg,et al.  Progress in protein structure prediction: assessment of CASP3. , 1999, Current opinion in structural biology.

[29]  W C Barker,et al.  Superfamily classification in PIR-International Protein Sequence Database. , 1996, Methods in enzymology.

[30]  David C. Jones,et al.  Progress in protein structure prediction. , 1997, Current opinion in structural biology.

[31]  A. Murzin OB(oligonucleotide/oligosaccharide binding)‐fold: common structural and functional solution for non‐homologous sequences. , 1993, The EMBO journal.

[32]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[33]  W. Pearson Effective protein sequence comparison. , 1996, Methods in enzymology.

[34]  L. Mirny,et al.  Protein structure prediction by threading. Why it works and why it does not. , 1998, Journal of molecular biology.

[35]  C DeLisi,et al.  Estimating the number of protein folds. , 1998, Journal of molecular biology.

[36]  Z. X. Wang,et al.  How many fold types of protein are there in nature? , 1996, Proteins.

[37]  S H Kim,et al.  The crystal structure of an Fe-superoxide dismutase from the hyperthermophile Aquifex pyrophilus at 1.9 A resolution: structural basis for thermostability. , 1997, Journal of molecular biology.

[38]  C. Orengo,et al.  Protein folds and functions. , 1998, Structure.

[39]  Janet M. Thornton,et al.  Prediction of progress at last , 1991, Nature.

[40]  L Shapiro,et al.  The Argonne Structural Genomics Workshop: Lamaze class for the birth of a new science. , 1998, Structure.

[41]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[42]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[43]  Sung-Hou Kim Shining a light on structural genomics , 1998, Nature Structural Biology.

[44]  Michael Levitt,et al.  A brighter future for protein structure prediction , 1999, Nature Structural Biology.

[45]  J. Newman,et al.  Class‐directed structure determination: Foundation for a protein structure initiative , 1998, Protein science : a publication of the Protein Society.

[46]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[47]  Jérôme Gouzy,et al.  Recent improvements of the ProDom database of protein domain families , 1999, Nucleic Acids Res..

[48]  William R. Pearson,et al.  Identifying distantly related protein sequences , 1991, Comput. Appl. Biosci..

[49]  R. Russell,et al.  Detection of protein three-dimensional side-chain patterns: new examples of convergent evolution. , 1998, Journal of molecular biology.

[50]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[51]  Cathy H. Wu,et al.  ProClass protein family database , 2000, Nucleic Acids Res..

[52]  C. Chothia,et al.  Population statistics of protein structures: lessons from structural classifications. , 1997, Current opinion in structural biology.

[53]  A. Sali 100,000 protein structures for the biologist , 1998, Nature Structural Biology.

[54]  Peer Bork,et al.  Sequences and topology Deriving biological knowledge from genomic sequences , 1998 .

[55]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[56]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[57]  Terry Gaasterland,et al.  Structural genomics: Bioinformatics in the driver's seat , 1998, Nature Biotechnology.

[58]  Amos Bairoch,et al.  The PROSITE database, its status in 1997 , 1997, Nucleic Acids Res..

[59]  A Tsugita,et al.  The PIR-International Protein Sequence Database. , 1996, Nucleic acids research.

[60]  G. Böhm,et al.  Structural relationships of homologous proteins as a fundamental principle in homology modeling , 1993, Proteins.

[61]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[62]  Craig M. Ogata,et al.  MAD phasing grows up , 1998, Nature Structural Biology.

[63]  EUKARYOTIC TRANSLATION INITIATION FACTOR 5A FROM METHANOCOCCUS JANNASCHII , 1998 .

[64]  Z. Ren,et al.  Synchrotron radiation applications to macromolecular crystallography. , 1997, Current opinion in structural biology.

[65]  Genomics: structural and functional studies of genomes. , 1997, Genomics.

[66]  M J Sternberg,et al.  Supersites within superfolds. Binding site similarity in the absence of homology. , 1998, Journal of molecular biology.

[67]  Protein structure. Prediction of progress at last. , 1991, Nature.

[68]  Michael Y. Galperin,et al.  Beyond complete genomes: from sequence to structure and function. , 1998, Current opinion in structural biology.

[69]  Hans-Werner Mewes,et al.  The PIR-International Protein Sequence Database , 1992, Nucleic Acids Res..

[70]  J M Thornton,et al.  Three-dimensional structure analysis of PROSITE patterns. , 1999, Journal of molecular biology.

[71]  S. Bryant Evaluation of threading specificity and accuracy , 1996, Proteins.

[72]  Martin Vingron,et al.  A set-theoretic approach to database searching and clustering , 1998, Bioinform..

[73]  Arne Elofsson,et al.  A comparison of sequence and structure protein domain families as a basis for structural genomics , 1999, Bioinform..

[74]  G. Montelione,et al.  A banner year for membranes , 1999, Nature Structural Biology.

[75]  Steven E. Brenner,et al.  The PRESAGE database for structural genomics , 1999, Nucleic Acids Res..

[76]  Golan Yona,et al.  Modeling protein families using probabilistic suffix trees , 1999, RECOMB.