ProPhylER: a curated online resource for protein function and structure based on evolutionary constraint analyses.

ProPhylER (Protein Phylogeny and Evolutionary Rates) is a next-generation curated proteome resource that uses comparative sequence analysis to predict constraint and mutation impact for eukaryotic proteins. Its purpose is to inform any research program for which protein function and structure are relevant, by the predictive power of evolutionary constraint analyses. ProPhylER currently has nearly 9000 clusters of related proteins, including more than 200,000 sequences. It serves data via two interfaces. The "ProPhylER Interface" displays predictive analyses in sequence space; the "CrystalPainter" maps evolutionary constraints onto solved protein structures. Here we summarize ProPhylER's data content and analysis pipeline, demonstrate the use of ProPhylER's interfaces, and evaluate ProPhylER's unique regional analysis of evolutionary constraint. The high accuracy of ProPhylER's regional analysis complements the high resolution of its single-site analysis to effectively guide and inform structure-function investigations and predict the impact of polymorphisms.

[1]  Katsuhiko Murakami,et al.  Evola: Ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees , 2007, Nucleic Acids Res..

[2]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  L. Pauling,et al.  Molecules as documents of evolutionary history. , 1965, Journal of theoretical biology.

[4]  Claudia Neuhauser,et al.  The Pattern of Amino Acid Replacements in α/β-Barrels , 2002 .

[5]  Claudia Neuhauser,et al.  The pattern of amino acid replacements in alpha/beta-barrels. , 2002, Molecular biology and evolution.

[6]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[7]  Tal Pupko,et al.  A structural EM algorithm for phylogenetic inference , 2001, J. Comput. Biol..

[8]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[9]  H. Kröger,et al.  [Protein synthesis]. , 1974, Fortschritte der Medizin.

[10]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[11]  M. Lewis,et al.  A closer view of the conformation of the Lac repressor bound to operator , 2000, Nature Structural Biology.

[12]  A. Zamyatnin,et al.  Protein volume in solution. , 1972, Progress in biophysics and molecular biology.

[13]  E. Fraenkel,et al.  Structural basis of DNA recognition by the heterodimeric cell cycle transcription factor E2F-DP. , 1999, Genes & development.

[14]  A Wlodawer,et al.  Structure of complex of synthetic HIV-1 protease with a substrate-based inhibitor at 2.3 A resolution. , 1989, Science.

[15]  D. Eisenberg,et al.  Analysis of membrane and surface protein sequences with the hydrophobic moment plot. , 1984, Journal of molecular biology.

[16]  Arend Sidow,et al.  Trade-offs in detecting evolutionarily constrained sequence by comparative genomics. , 2005, Annual review of genomics and human genetics.

[17]  S. Bouvier,et al.  Systematic mutation of bacteriophage T4 lysozyme. , 1991, Journal of molecular biology.

[18]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[19]  A. Sidow,et al.  The integrity of a cholesterol-binding pocket in Niemann–Pick C2 protein is necessary to control lysosome cholesterol levels , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Christophe Dessimoz,et al.  Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods , 2009, PLoS Comput. Biol..

[21]  Justin C. Fay,et al.  Sequence divergence, functional constraint, and selection in protein evolution. , 2003, Annual review of genomics and human genetics.

[22]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[23]  Duncan P. Brown,et al.  Functional Classification Using Phylogenomic Inference , 2006, PLoS Comput. Biol..

[24]  K. Holsinger The neutral theory of molecular evolution , 2004 .

[25]  A. Sidow,et al.  Phenotype-genotype correlation in Hirschsprung disease is illuminated by comparative analysis of the RET protein sequence. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[26]  W. Fitch Homology a personal view on some of the problems. , 2000, Trends in genetics : TIG.

[27]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[28]  T. Boggon,et al.  C-Cadherin Ectodomain Structure and Implications for Cell Adhesion Mechanisms , 2002, Science.

[29]  Itay Mayrose,et al.  ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures , 2005, Nucleic Acids Res..

[30]  C. Harris,et al.  The IARC TP53 database: New online mutation analysis and recommendations to users , 2002, Human mutation.

[31]  A. Sidow,et al.  Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. , 2005, Genome research.

[32]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[33]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[34]  Arend Sidow,et al.  Sequence First. Ask Questions Later. , 2002, Cell.

[35]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[36]  P. Jeffrey,et al.  Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenic mutations. , 1994, Science.

[37]  S. Pietrokovski Searching databases of conserved sequence regions by aligning protein multiple-alignments. , 1996, Nucleic acids research.

[38]  M. Philippova,et al.  Structure and Functions of Classical Cadherins , 2001, Biochemistry (Moscow).

[39]  Arend Sidow,et al.  Constructing a meaningful evolutionary average at the phylogenetic center of mass , 2007, BMC Bioinformatics.

[40]  R Montesano,et al.  Database of p53 gene somatic mutations in human tumors and cell lines. , 1994, Nucleic acids research.

[41]  Gaston H. Gonnet,et al.  OMA Browser - Exploring orthologous relations across 352 complete genomes , 2007, Bioinform..

[42]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[43]  Jeffrey Miller,et al.  Genetic Studies of Lac Repressor: 4000 Single Amino Acid Substitutions and Analysis of the Resulting Phenotypes on the Basis of the Protein Structure , 1996, German Conference on Bioinformatics.

[44]  A. Sidow,et al.  Identification of the Otopetrin Domain, a conserved domain in vertebrate otopetrins and invertebrate otopetrin-like family members , 2008, BMC Evolutionary Biology.

[45]  S. Henikoff,et al.  Predicting the effects of amino acid substitutions on protein function. , 2006, Annual review of genomics and human genetics.

[46]  J H Miller,et al.  Genetic studies of the lac repressor. I. Correlation of mutational sites with specific amino acid residues: construction of a colinear gene-protein map. , 1977, Journal of molecular biology.

[47]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[48]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[49]  T. Steitz,et al.  Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. , 1986, Annual review of biophysics and biophysical chemistry.

[50]  A. Sidow,et al.  Structural and molecular evolutionary analysis of Agouti and Agouti-related proteins. , 2006, Chemistry & biology.

[51]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[52]  M. Blumenkranz,et al.  A novel His158Arg mutation in TIMP3 causes a late-onset form of Sorsby fundus dystrophy. , 2006, American journal of ophthalmology.

[53]  Narmada Thanki,et al.  CDD: specific functional annotation with the Conserved Domain Database , 2008, Nucleic Acids Res..

[54]  J. Lilien,et al.  Alterations in CDH15 and KIRREL3 in patients with mild to severe intellectual disability. , 2008, American journal of human genetics.

[55]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[56]  B. Matthews,et al.  A covalent enzyme-substrate intermediate with saccharide distortion in a mutant T4 lysozyme. , 1993, Science.

[57]  S. Hirohashi,et al.  Molecular Cloning and Characterization of a Novel Human Classic Cadherin Homologous with Mouse Muscle Cadherin* , 1998, The Journal of Biological Chemistry.

[58]  C Cruz,et al.  Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as "spacers" which do not require a specific sequence. , 1994, Journal of molecular biology.

[59]  M. Ruggero,et al.  Similarity of Traveling-Wave Delays in the Hearing Organs of Humans and Other Tetrapods , 2007, Journal for the Association for Research in Otolaryngology.

[60]  Leonid Peshkin,et al.  Roundup: a multi-genome repository of orthologs and evolutionary distances , 2006, Bioinform..

[61]  Kara Dolinski,et al.  The Princeton Protein Orthology Database (P-POD): A Comparative Genomics Analysis Tool for Biologists , 2007, PloS one.

[62]  Piero Fariselli,et al.  ConSeq: the identification of functionally and structurally important residues in protein sequences , 2004, Bioinform..

[63]  Marianne Manchester,et al.  Complete mutagenesis of the HIV-1 protease , 1989, Nature.

[64]  N. Ben-Tal,et al.  Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. , 2004, Molecular biology and evolution.

[65]  M. Perutz,et al.  The crystal structure of human deoxyhaemoglobin at 1.74 A resolution. , 1984, Journal of molecular biology.

[66]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[67]  F. Crick On protein synthesis. , 1958, Symposia of the Society for Experimental Biology.

[68]  W. Fitch,et al.  An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution , 1970, Biochemical Genetics.

[69]  Arend Sidow,et al.  Inference of functional regions in proteins by quantification of evolutionary constraints , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[70]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[71]  Andreas Prlic,et al.  Ensembl 2008 , 2007, Nucleic Acids Res..