The Statistical Mechanics Approach to Protein Sequence Data: Beyond Contact Prediction

The recent application of models from inverse statistical mechanics to protein sequence data in has been a large success. In my thesis, I will build upon these models but also use them beyond their original aim of residue contact prediction. This includes the improvement of contact prediction itself by extending the models, the application of the methods in the wider scope of protein interaction networks and the prediction of further biological characteristics from the extracted information

[1]  K. Katoh,et al.  MAFFT version 5: improvement in accuracy of multiple sequence alignment , 2005, Nucleic acids research.

[2]  George A. Khoury,et al.  Protein folding and de novo protein design for biotechnological applications. , 2014, Trends in biotechnology.

[3]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[4]  Maurice G. Kendall,et al.  The advanced theory of statistics , 1945 .

[5]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[6]  W. P. Russ,et al.  Evolutionary information for specifying a protein fold , 2005, Nature.

[7]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[8]  P. Bork,et al.  Predicting biological networks from genomic data , 2008, FEBS letters.

[9]  Ann M Stock,et al.  Two-component signal transduction. , 2000, Annual review of biochemistry.

[10]  R. Huber,et al.  Structure of bovine pancreatic trypsin inhibitor. Results of joint neutron and X-ray refinement of crystal form II. , 1984, Journal of molecular biology.

[11]  Michael T. Laub,et al.  Pervasive degeneracy and epistasis in a protein-protein interface , 2015, Science.

[12]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[13]  Gary D Bader,et al.  Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry , 2002, Nature.

[14]  W. P. Russ,et al.  Natural-like function in artificial WW domains , 2005, Nature.

[15]  B. Lunt,et al.  Dissecting the Specificity of Protein-Protein Interaction in Bacterial Two-Component Signaling: Orphans and Crosstalks , 2011, PloS one.

[16]  A. Valencia,et al.  Computational methods for the prediction of protein interactions. , 2002, Current opinion in structural biology.

[17]  R. Ozawa,et al.  A comprehensive two-hybrid analysis to explore the yeast protein interactome , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Terence Hwa,et al.  High-resolution protein complexes from integrating genomic information with molecular simulation , 2009, Proceedings of the National Academy of Sciences.

[19]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[20]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[21]  I. Schlichting,et al.  Crystal Structures of a New Class of Allosteric Effectors Complexed to Tryptophan Synthase* , 2002, The Journal of Biological Chemistry.

[22]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[23]  A. Valencia,et al.  High-confidence prediction of global interactomes based on genome-wide coevolutionary networks , 2008, Proceedings of the National Academy of Sciences.

[24]  Tony Pawson,et al.  WW Domains Provide a Platform for the Assembly of Multiprotein Networks , 2005, Molecular and Cellular Biology.

[25]  David E. Kim,et al.  Full title Improved de novo Structure Prediction in CASP 11 by Incorporating Co-evolution Information into Rosetta Short title Structure Prediction using Co-evolution , 2015 .

[26]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[27]  Graham J. G. Upton,et al.  A Dictionary of Statistics , 2002 .

[28]  Junmei Wang,et al.  Development and testing of a general amber force field , 2004, J. Comput. Chem..

[29]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[30]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[31]  David Haussler,et al.  Detecting Coevolution in and among Protein Domains , 2007, PLoS Comput. Biol..

[32]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[33]  A. Valencia,et al.  From residue coevolution to protein conformational ensembles and functional dynamics , 2015, Proceedings of the National Academy of Sciences.

[34]  Benoit H. Dessailly,et al.  Exploring the structure and function paradigm. , 2008, Current opinion in structural biology.

[35]  Anton J. Enright,et al.  Myriads of protein families, and still counting , 2003, Genome Biology.

[36]  D. Eisenberg,et al.  Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[37]  S. Stenholm Information, Physics and Computation, by Marc Mézard and Andrea Montanari , 2010 .

[38]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[39]  Michael J. Mossinghoff,et al.  Combinatorics and graph theory , 2000 .

[40]  Peter E. Latham,et al.  Pairwise Maximum Entropy Models for Studying Large Biological Systems: When They Can Work and When They Can't , 2008, PLoS Comput. Biol..

[41]  A. Valencia,et al.  In silico two‐hybrid system for the selection of physically interacting protein pairs , 2002, Proteins.

[42]  E. Marcotte,et al.  Predicting functional linkages from gene fusions with confidence. , 2002, Applied bioinformatics.

[43]  A. Lehninger Principles of Biochemistry , 1984 .

[44]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[45]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[46]  J. Onuchic,et al.  An all‐atom structure‐based potential for proteins: Bridging minimal models with all‐atom empirical forcefields , 2009, Proteins.

[47]  A. Tramontano,et al.  New encouraging developments in contact prediction: Assessment of the CASP11 results , 2016, Proteins.

[48]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[49]  R. Baierlein Probability Theory: The Logic of Science , 2004 .

[50]  R. Bauerle,et al.  The crystal structure of anthranilate synthase from Sulfolobus solfataricus: functional implications. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[52]  Marcin J. Skwark,et al.  Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns , 2014, PLoS Comput. Biol..

[53]  M Wilmanns,et al.  Three-dimensional structure of the bifunctional enzyme phosphoribosylanthranilate isomerase: indoleglycerolphosphate synthase from Escherichia coli refined at 2.0 A resolution. , 1992, Journal of molecular biology.

[54]  Tommi S. Jaakkola,et al.  Tractable Bayesian learning of tree belief networks , 2000, Stat. Comput..

[55]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[56]  Nigel F. Delaney,et al.  Darwinian Evolution Can Follow Only Very Few Mutational Paths to Fitter Proteins , 2006, Science.

[57]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[58]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[59]  F. Y. Wu The Potts model , 1982 .

[60]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..