Bayesian statistical approach for protein residue-residue contact prediction

Despite continuous efforts in automating experimental structure determination and systematic target selection in structural genomics projects, the gap between the number of known amino acid sequences and solved 3D structures for proteins is constantly widening. While DNA sequencing technologies are advancing at an extraordinary pace, thereby constantly increasing throughput while at the same time reducing costs, protein structure determination is still labour intensive, time-consuming and expensive. This trend illustrates the essential importance of complementary computational approaches in order to bridge the so-called sequence-structure gap. About half of the protein families lack structural annotation and therefore are not amenable to techniques that infer protein structure from homologs. These protein families can be addressed by de novo structure prediction approaches that in practice are often limited by the immense computational costs required to search the conformational space for the lowest-energy conformation. Improved predictions of contacts between amino acid residues have been demonstrated to sufficiently constrain the overall protein fold and thereby extend the applicability of de novo methods to larger proteins. Residue-residue contact prediction is based on the idea that selection pressure on protein structure and function can lead to compensatory mutations between spatially close residues. This leaves an echo of correlation signatures that can be traced down from the evolutionary record. Despite the success of contact prediction methods, there are several challenges. The most evident limitation lies in the requirement of deep alignments, which excludes the majority of protein families without associated structural information that are the focus for contact guided de novo structure prediction. The heuristics applied by current contact prediction methods pose another challenge, since they omit available coevolutionary information. This work presents two different approaches for addressing the limitations of contact prediction methods. Instead of inferring evolutionary couplings by maximizing the pseudo-likelihood, I maximize the full likelihood of the statistical model for protein sequence families. This approach performed with comparable precision up to minor improvements over the pseudo-likelihood methods for protein families with few homologous sequences. A Bayesian statistical approach has been developed that provides posterior probability estimates for residue-residue contacts and eradicates the use of heuristics. The full information of coevolutionary signatures is exploited by explicitly modelling the distribution of statistical couplings that reflects the nature of residue-residue interactions. Surprisingly, the posterior probabilities do not directly translate into more precise predictions than obtained by pseudo-likelihood methods combined with prior knowledge. However, the Bayesian framework offers a statistically clean and theoretically solid treatment for the contact prediction problem. This flexible and transparent framework provides a convenient starting point for further developments, such as integrating more complex prior knowledge. The model can also easily be extended towards the Derivation of probability estimates for residue-residue distances to enhance the precision of predicted structures.

[1]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[2]  David E. Kim,et al.  Large-scale determination of previously unsolved protein structures using evolutionary information , 2015, eLife.

[3]  Natalia N. Ivanova,et al.  1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life , 2017, Nature Biotechnology.

[4]  Marcin J. Skwark,et al.  Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns , 2014, PLoS Comput. Biol..

[5]  S. Forster Illuminating microbial diversity , 2017, Nature Reviews Microbiology.

[6]  D. Baker,et al.  Robust and accurate prediction of residue–residue interactions across protein interfaces using evolutionary information , 2014, eLife.

[7]  Piero Fariselli,et al.  Reconstruction of 3D Structures From Protein Contact Maps , 2008, IEEE ACM Trans. Comput. Biol. Bioinform..

[8]  Z. Xiang,et al.  On the role of the crystal environment in determining protein side-chain conformations. , 2002, Journal of molecular biology.

[9]  Debora S. Marks,et al.  Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models , 2015, PLoS Comput. Biol..

[10]  Zhiyong Wang,et al.  Predicting protein contact map using evolutionary and physical constraints by integer programming , 2013, Bioinform..

[11]  Andrea Pagnani,et al.  Inter-Protein Sequence Co-Evolution Predicts Known Physical Interactions in Bacterial Ribosomes and the Trp Operon , 2015, PloS one.

[12]  Dongsup Kim,et al.  A new method for revealing correlated mutations under the structural and functional constraints in proteins , 2009, Bioinform..

[13]  J. M. Zimmerman,et al.  The characterization of amino acid sequences in proteins by statistical methods. , 1968, Journal of theoretical biology.

[14]  Piero Fariselli,et al.  On the Reconstruction of Three-dimensional Protein Structures from Contact Maps , 2009, Algorithms.

[15]  Ziding Zhang,et al.  Predicting Residue-Residue Contacts and Helix-Helix Interactions in Transmembrane Proteins Using an Integrative Feature-Based Random Forest Approach , 2011, PloS one.

[16]  Miao Sun,et al.  QAcon: single model quality assessment using protein structural and contact information with machine learning techniques , 2016, Bioinform..

[17]  Sjors H. W. Scheres,et al.  Unravelling biological macromolecules with cryo-electron microscopy , 2016, Nature.

[18]  Marcin J. Skwark,et al.  Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis , 2016, bioRxiv.

[19]  Christian Igel,et al.  An Introduction to Restricted Boltzmann Machines , 2012, CIARP.

[20]  Erik Aurell,et al.  Correlation-compressed direct-coupling analysis , 2017, Physical Review E.

[21]  Debora S. Marks,et al.  Protein structure determination by combining sparse NMR data with evolutionary couplings , 2015, Nature Methods.

[22]  Simona Cocco,et al.  Inverse statistical physics of protein sequences: a key issues review , 2017, Reports on progress in physics. Physical Society.

[23]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[24]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[25]  B. Brooks,et al.  Accurate High-Throughput Structure Mapping and Prediction with Transition Metal Ion FRET , 2013 .

[26]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[27]  Edward M. Rubin,et al.  Metagenomics: DNA sequencing of environmental samples , 2005, Nature Reviews Genetics.

[28]  M. Snyder,et al.  High-throughput sequencing technologies. , 2015, Molecular cell.

[29]  Carlo Baldassi,et al.  Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis , 2016, Proceedings of the National Academy of Sciences.

[30]  J. McPherson,et al.  Coming of age: ten years of next-generation sequencing technologies , 2016, Nature Reviews Genetics.

[31]  A. Lapedes,et al.  Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[33]  D. Shaw,et al.  Assessment of the utility of contact‐based restraints in accelerating the prediction of protein structure using molecular dynamics simulations , 2015, Protein science : a publication of the Protein Society.

[34]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[35]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[36]  Michael Lappe,et al.  Defining an Essence of Structure Determining Residue Contacts in Proteins , 2009, PLoS Comput. Biol..

[37]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[38]  Geoffrey J. Barton,et al.  Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation , 1993, Comput. Appl. Biosci..

[39]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[40]  W. Atchley,et al.  Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. , 2000, Molecular biology and evolution.

[41]  Jouhyun Jeon,et al.  Molecular evolution of protein conformational changes revealed by a network of evolutionarily coupled residues. , 2011, Molecular biology and evolution.

[42]  Yuval Elhanati,et al.  Quantifying selection in immune receptor repertoires , 2014 .

[43]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[44]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[45]  C. Sander,et al.  Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? , 1994, Protein engineering.

[46]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[47]  Christian Igel,et al.  Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines , 2010, ICANN.

[48]  G Vriend,et al.  Correlated Mutation Analyses on Very Large Sequence Families , 2002, Chembiochem : a European journal of chemical biology.

[49]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[50]  R. Aldrich,et al.  Influence of conservation on calculations of amino acid covariance in multiple sequence alignments , 2004, Proteins.

[51]  Natalia N. Ivanova,et al.  Insights into the phylogeny and coding potential of microbial dark matter , 2013, Nature.

[52]  Alfonso Valencia,et al.  Emerging methods in protein co-evolution , 2013 .

[53]  E. van Nimwegen,et al.  Accurate Prediction of Protein–protein Interactions from Sequence Alignments Using a Bayesian Method , 2022 .

[54]  Jinbo Xu,et al.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model , 2016 .

[55]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[56]  D. Baker,et al.  De novo protein structure determination using sparse NMR data , 2000, Journal of biomolecular NMR.

[57]  Padhraic Smyth,et al.  Learning with Blocks: Composite Likelihood and Contrastive Divergence , 2010, AISTATS.

[58]  Jianlin Cheng,et al.  NNcon: improved protein contact map prediction using 2D-recursive neural networks , 2009, Nucleic Acids Res..

[59]  A Godzik,et al.  Conservation of residue interactions in a family of Ca-binding proteins. , 1989, Protein engineering.

[60]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[61]  Zhendong Bei,et al.  COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming , 2016, Proteins.

[62]  Genki Terashi,et al.  Quality assessment methods for 3D protein structure models based on a residue-residue distance matrix prediction. , 2014, Chemical & pharmaceutical bulletin.

[63]  Zhiyong Wang,et al.  Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning , 2013, Bioinform..

[64]  Oliver Brock,et al.  EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction , 2017, BMC Bioinformatics.

[65]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[66]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[67]  Timothy Nugent,et al.  Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis , 2012, Proceedings of the National Academy of Sciences.

[68]  Feng Ding,et al.  Fidelity of the protein structure reconstruction from inter-residue proximity constraints. , 2007, The journal of physical chemistry. B.

[69]  Jinbo Xu,et al.  Analysis of deep learning methods for blind protein contact prediction in CASP12 , 2018, Proteins.

[70]  A. Tramontano,et al.  New encouraging developments in contact prediction: Assessment of the CASP11 results , 2016, Proteins.

[71]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[72]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[73]  P. Fraser Prions and Prion-like Proteins , 2014, The Journal of Biological Chemistry.

[74]  B. K.C.Dukka,et al.  Recent advances in sequence-based protein structure prediction , 2017, Briefings Bioinform..

[75]  K. Burrage,et al.  Protein contact prediction using patterns of correlation , 2004, Proteins.

[76]  Laurent Heutte,et al.  Influence of Hyperparameters on Random Forest Accuracy , 2009, MCS.

[77]  Lucy J. Colwell,et al.  Inferring interaction partners from protein sequences , 2016 .

[78]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[79]  Edward H Egelman,et al.  The Current Revolution in Cryo-EM. , 2016, Biophysical journal.

[80]  Pierre Baldi,et al.  Deep architectures for protein contact map prediction , 2012, Bioinform..

[81]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[82]  A. Sali,et al.  Comparative protein structure modeling of genes and genomes. , 2000, Annual review of biophysics and biomolecular structure.

[83]  Michael I Sadowski Prediction of protein domain boundaries from inverse covariances , 2013, Proteins.

[84]  B. Gidas Consistency of Maximum Likelihood and Pseudo-Likelihood Estimators for Gibbs Distributions , 1988 .

[85]  P. Hugenholtz,et al.  Why the ‘ meta ’ in metagenomics ? , 2022 .

[86]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[87]  Haim Ashkenazy,et al.  Optimal data collection for correlated mutation analysis , 2009, Proteins.

[88]  Claus A M Seidel,et al.  A toolkit and benchmark study for FRET-restrained high-precision structural modeling , 2012, Nature Methods.

[89]  W. Taylor,et al.  Global fold determination from a small number of distance restraints. , 1995, Journal of molecular biology.

[90]  Oliver F. Lange,et al.  Combining Evolutionary Information and an Iterative Sampling Strategy for Accurate Protein Structure Prediction , 2015, PLoS Comput. Biol..

[91]  Thomas A. Hopf,et al.  Mutation effects predicted from sequence co-variation , 2017, Nature Biotechnology.

[92]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[93]  W. Braun,et al.  Sequence specificity, statistical potentials, and three‐dimensional structure prediction with self‐correcting distance geometry calculations of β‐sheet formation in proteins , 2008 .

[94]  E. Aurell,et al.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[95]  Bjoern H. Menze,et al.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data , 2009, BMC Bioinformatics.

[96]  J. Skolnick,et al.  Assembly of protein structure from sparse experimental data: An efficient Monte Carlo model , 1998, Proteins.

[97]  Katherine S. Pollard,et al.  Coevolutionary analyses require phylogenetically deep alignments and better null models to accurately detect inter-protein contacts within and between species , 2015 .

[98]  Gilles Louppe,et al.  Understanding Random Forests: From Theory to Practice , 2014, 1407.7502.

[99]  Terence Hwa,et al.  Coevolutionary signals across protein lineages help capture multiple protein conformations , 2013, Proceedings of the National Academy of Sciences.

[100]  A. Szilágyi,et al.  Improving protein structure prediction using multiple sequence-based contact predictions. , 2011, Structure.

[101]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[102]  Alfonso Valencia,et al.  Conservation of coevolving protein interfaces bridges prokaryote–eukaryote homologies in the twilight zone , 2016, Proceedings of the National Academy of Sciences.

[103]  Saulo H. P. de Oliveira,et al.  Comparing co-evolution methods and their application to template-free protein structure prediction. , 2016, Bioinformatics.

[104]  A. Horovitz,et al.  Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations , 2002, Proteins.

[105]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[106]  Stephen H. White,et al.  Experimentally determined hydrophobicity scale for proteins at membrane interfaces , 1996, Nature Structural Biology.

[107]  Christopher Jarzynski,et al.  Using Sequence Alignments to Predict Protein Structure and Stability With High Accuracy , 2012, 1207.2484.

[108]  S. Wodak,et al.  Deviations from standard atomic volumes as a quality measure for protein crystal structures. , 1996, Journal of molecular biology.

[109]  Anders Gorm Pedersen,et al.  Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation , 2007, Algorithms for molecular biology : AMB.

[110]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[111]  Janusz M. Bujnicki,et al.  GDFuzz3D: a method for protein 3D structure reconstruction from contact maps, based on a non-Euclidean distance function , 2015, Bioinform..

[112]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[113]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[114]  G. Stormo,et al.  Correlated mutations in models of protein sequences: phylogenetic and structural effects , 1999 .

[115]  David T. Jones,et al.  MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins , 2014, Bioinform..

[116]  Modesto Orozco,et al.  Residues Coevolution Guides the Systematic Identification of Alternative Functional Conformations in Proteins. , 2016, Structure.

[117]  Marcin J. Skwark,et al.  PconsC: combination of direct information methods and alignments improves contact prediction , 2013, Bioinform..

[118]  N D Clarke,et al.  Covariation of residues in the homeodomain sequence family , 1995, Protein science : a publication of the Protein Society.

[119]  Torsten Schwede,et al.  Protein modeling: what happened to the "protein structure gap"? , 2013, Structure.

[120]  Thomas A. Hopf,et al.  Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[121]  Sivaraman Balakrishnan,et al.  Learning generative models for protein fold families , 2011, Proteins.

[122]  Oliver Brock,et al.  RBO Aleph: leveraging novel information sources for protein structure prediction , 2015, Nucleic Acids Res..

[123]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[124]  Marcin J. Skwark,et al.  PconsFold: improved contact predictions improve protein models , 2014, Bioinform..

[125]  P. Barth,et al.  Evolutionary-guided de novo structure prediction of self-associated transmembrane helical proteins with near-atomic accuracy , 2015, Nature Communications.

[126]  Mehdi Mobli,et al.  Macromolecular NMR spectroscopy for the non‐spectroscopist: beyond macromolecular solution structure determination , 2011, The FEBS journal.

[127]  Martin Weigt,et al.  Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis , 2017, Proceedings of the National Academy of Sciences.

[128]  David E. Kim,et al.  Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta , 2016, Proteins.

[129]  Aapo Hyvärinen,et al.  Consistency of Pseudolikelihood Estimation of Fully Visible Boltzmann Machines , 2006, Neural Computation.

[130]  Pierre Baldi,et al.  Improved residue contact prediction using support vector machines and a large feature set , 2007, BMC Bioinformatics.

[131]  Jorick Franceus,et al.  Correlated positions in protein evolution and engineering , 2017, Journal of Industrial Microbiology & Biotechnology.

[132]  Malgorzata Kotulska,et al.  Automated Procedure for Contact-Map-Based Protein Structure Reconstruction , 2014, The Journal of Membrane Biology.

[133]  Renzhi Cao,et al.  UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling , 2016, Bioinform..

[134]  Jianlin Cheng,et al.  Predicting protein residue-residue contacts using deep networks and boosting , 2012, Bioinform..

[135]  David E. Kim,et al.  One contact for every twelve residues allows robust and accurate topology‐level protein structure modeling , 2014, Proteins.

[136]  Desmond G. Higgins,et al.  Using de novo protein structure predictions to measure the quality of very large multiple sequence alignments , 2015, Bioinform..

[137]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[138]  Ricardo N Dos Santos,et al.  Dimeric interactions and complex formation using direct coevolutionary couplings , 2015, Scientific Reports.

[139]  Yoshua Bengio,et al.  Justifying and Generalizing Contrastive Divergence , 2009, Neural Computation.

[140]  Thomas A. Hopf,et al.  Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[141]  Aapo Hyvärinen,et al.  Connections Between Score Matching, Contrastive Divergence, and Pseudolikelihood for Continuous-Valued Variables , 2007, IEEE Transactions on Neural Networks.

[142]  Yang Zhang,et al.  Application of sparse NMR restraints to large-scale protein structure prediction. , 2004, Biophysical journal.

[143]  D. Baker,et al.  Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[144]  A. B. Robinson,et al.  Distribution of glutamine and asparagine residues and their near neighbors in peptides and proteins. , 1991, Proceedings of the National Academy of Sciences of the United States of America.

[145]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[146]  C. DeLisi,et al.  Hydrophobicity scales and computational techniques for detecting amphipathic structures in proteins. , 1987, Journal of molecular biology.

[147]  Peng Chen,et al.  Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers , 2010, BMC Structural Biology.

[148]  Correlated mutations distinguish misfolded and properly folded proteins , 2017 .

[149]  David T. Jones,et al.  Accurate contact predictions using covariation techniques and machine learning , 2015, Proteins.

[150]  K Suvarna Vani,et al.  Feature Extraction of Protein Contact Maps from Protein 3D-Coordinates , 2018 .

[151]  Giuseppe Tradigo,et al.  Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks , 2014, BMC Bioinformatics.

[152]  R. Sun,et al.  Coupling high-throughput genetics with phylogenetic information reveals an epistatic interaction on the influenza A virus M segment , 2016, BMC Genomics.

[153]  Simona Cocco,et al.  Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction , 2015, Nucleic acids research.

[154]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[155]  C. Sander,et al.  All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences , 2015, Proceedings of the National Academy of Sciences.

[156]  A. Horovitz,et al.  Detection and reduction of evolutionary noise in correlated mutation analysis. , 2005, Protein engineering, design & selection : PEDS.

[157]  J. Rappsilber The beginning of a beautiful friendship: Cross-linking/mass spectrometry and modelling of proteins and multi-protein complexes , 2011, Journal of structural biology.

[158]  T. Hwa,et al.  Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[159]  Structural biology: RNA structure from sequence , 2016, Nature Methods.

[160]  C. Levinthal How to fold graciously , 1969 .

[161]  Johannes Söding,et al.  Bbcontacts: Prediction of Β-strand Pairing from Direct Coupling Patterns , 2015, Bioinform..

[162]  Thomas W. H. Lui,et al.  Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments , 2003, Bioinform..

[163]  Ruedi Aebersold,et al.  Mass spectrometry supported determination of protein complex structure. , 2013, Current opinion in structural biology.

[164]  Erik van Nimwegen,et al.  Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments , 2010, PLoS Comput. Biol..

[165]  Andrea Pagnani,et al.  Maximum-Entropy Models of Sequenced Immune Repertoires Predict Antigen-Antibody Affinity , 2016, PLoS Comput. Biol..

[166]  Faruck Morcos,et al.  From structure to function: the convergence of structure based models and co-evolutionary information. , 2014, Physical chemistry chemical physics : PCCP.

[167]  Debora S. Marks,et al.  Amino acid coevolution reveals three-dimensional structure and functional domains of insect odorant receptors , 2015, Nature Communications.

[168]  Yang Zhang,et al.  A comprehensive assessment of sequence-based and template-based methods for protein contact prediction , 2008, Bioinform..

[169]  R. Best,et al.  How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis. , 2017, Biophysical journal.

[170]  Bin Xue,et al.  Predicting residue–residue contact maps by a two‐layer, integrated neural‐network method , 2009, Proteins.

[171]  Cristina Marino Buslje,et al.  Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information , 2009, Bioinform..

[172]  M. Tress,et al.  Predicted residue–residue contacts can help the scoring of 3D models , 2010, Proteins.

[173]  T. Petersen,et al.  A generic method for assignment of reliability scores applied to solvent accessibility predictions , 2009, BMC Structural Biology.

[174]  Christoph Lassner,et al.  Early Stopping without a Validation Set , 2017, ArXiv.

[175]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[176]  F. Morcos,et al.  Sequence co-evolutionary information is a natural partner to minimally-frustrated models of biomolecular dynamics , 2016, F1000Research.

[177]  Philip E. Bourne,et al.  Achievements and challenges in structural bioinformatics and computational biophysics , 2014, Bioinform..

[178]  David T. Jones,et al.  De Novo Structure Prediction of Globular Proteins Aided by Sequence Variation-Derived Contacts , 2014, PloS one.

[179]  J. Skolnick,et al.  TOUCHSTONE II: a new approach to ab initio protein structure prediction. , 2003, Biophysical journal.

[180]  Xiaojie Wang,et al.  Average Contrastive Divergence for Training Restricted Boltzmann Machines , 2016, Entropy.

[181]  Markus Gruber,et al.  CCMpred—fast and precise prediction of protein residue–residue contacts from correlated mutations , 2014, Bioinform..

[182]  Gwyndaf Evans,et al.  Membrane protein structure determination — The next generation , 2014, Biochimica et biophysica acta.

[183]  Michael Lappe,et al.  Optimal contact definition for reconstruction of Contact Maps , 2010, BMC Bioinformatics.

[184]  Thierry Mora,et al.  Capturing coevolutionary signals inrepeat proteins , 2014, BMC Bioinformatics.

[185]  E. Carpenter,et al.  Overcoming the challenges of membrane protein crystallography , 2008, Current opinion in structural biology.

[186]  David A. Lee,et al.  CATH: comprehensive structural and functional annotations for genome sequences , 2014, Nucleic Acids Res..

[187]  R. Monasson,et al.  Direct coevolutionary couplings reflect biophysical residue interactions in proteins. , 2016, The Journal of chemical physics.

[188]  Kurt Wüthrich,et al.  Solution NMR structure determination of proteins revisited , 2008, Journal of biomolecular NMR.

[189]  Martin Weigt,et al.  Coevolutionary landscape inference and the context-dependence of mutations in beta-lactamase TEM-1 , 2015 .

[190]  Alexander Miguel Monzon,et al.  Conformational diversity and the emergence of sequence signatures during evolution. , 2015, Current opinion in structural biology.

[191]  Luís C. Lamb,et al.  Three-dimensional protein structure prediction: Methods and computational strategies , 2014, Comput. Biol. Chem..

[192]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[193]  O. Brock,et al.  Combining Physicochemical and Evolutionary Information for Protein Contact Prediction , 2014, PloS one.

[194]  P Fariselli,et al.  Prediction of contact maps with neural networks and correlated mutations. , 2001, Protein engineering.

[195]  Yang Zhang,et al.  NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers , 2017, Bioinform..

[196]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[197]  Carlo Baldassi,et al.  Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners , 2014, PloS one.

[198]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[199]  Mehari B. Zerihun,et al.  Biomolecular coevolution and its applications: Going from structure prediction toward signaling, epistasis, and function. , 2017, Biochemical Society transactions.

[200]  W. Atchley,et al.  Solving the protein sequence metric problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[201]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[202]  Marcin J. Skwark,et al.  Protein contact prediction from amino acid co-evolution using convolutional networks for graph-valued images , 2016, NIPS.

[203]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[204]  Marcin J. Skwark,et al.  Improving Contact Prediction along Three Dimensions , 2014, PLoS Comput. Biol..

[205]  Thomas A. Hopf,et al.  Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[206]  Nando de Freitas,et al.  A tutorial on stochastic approximation algorithms for training Restricted Boltzmann Machines and Deep Belief Nets , 2010, 2010 Information Theory and Applications Workshop (ITA).

[207]  A. Valencia,et al.  From residue coevolution to protein conformational ensembles and functional dynamics , 2015, Proceedings of the National Academy of Sciences.

[208]  Marcin J. Skwark,et al.  Accurate contact predictions for thousands of protein families using PconsC3 , 2016 .

[209]  Jianlin Cheng,et al.  CONFOLD: Residue‐residue contact‐guided ab initio protein folding , 2015, Proteins.

[210]  Anna Tramontano,et al.  Evaluation of residue–residue contact prediction in CASP10 , 2014, Proteins.

[211]  Jianwen Fang,et al.  Predicting residue-residue contacts using random forest models , 2011, Bioinform..

[212]  R. Jernigan,et al.  Self‐consistent estimation of inter‐residue protein contact energies based on an equilibrium mixture approximation of residues , 1999, Proteins.

[213]  Wei Li,et al.  CoinFold: a web server for protein contact prediction and contact-assisted protein folding , 2016, Nucleic Acids Res..

[214]  A. Tramontano,et al.  Evaluation of residue–residue contact predictions in CASP9 , 2011, Proteins.

[215]  Thomas A. Hopf,et al.  Structured States of Disordered Proteins from Genomic Sequences , 2016, Cell.

[216]  L. C. Martin,et al.  Using information theory to search for co-evolving residues in proteins , 2005, Bioinform..

[217]  Andrej Sali,et al.  Integrative Structural Biology , 2013, Science.

[218]  E. Neher How frequent are correlated changes in families of protein sequences? , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[219]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[220]  Renzhi Cao,et al.  Assessing Predicted Contacts for Building Protein Three-Dimensional Models. , 2017, Methods in molecular biology.

[221]  Kevin Karplus,et al.  Contact prediction using mutual information and neural nets , 2007, Proteins.

[222]  K. Hatrick,et al.  Compensating changes in protein multiple sequence alignments. , 1994, Protein engineering.

[223]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[224]  Magnus Ekeberg,et al.  Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences , 2014, J. Comput. Phys..

[225]  G A Petsko,et al.  Aromatic-aromatic interaction: a mechanism of protein structure stabilization. , 1985, Science.

[226]  C. Sander,et al.  3D RNA from evolutionary couplings , 2015, bioRxiv.

[227]  David Baker,et al.  Origins of coevolution between residues distant in protein 3D structures , 2017, Proceedings of the National Academy of Sciences.