论文信息 - Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences - 字舞流文

Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences

Abstract Direct-coupling analysis is a group of methods to harvest information about coevolving residues in a protein family by learning a generative model in an exponential family from data. In protein families of realistic size, this learning can only be done approximately, and there is a trade-off between inference precision and computational speed. We here show that an earlier introduced l 2 -regularized pseudolikelihood maximization method called plmDCA can be modified as to be easily parallelizable, as well as inherently faster on a single processor, at negligible difference in accuracy. We test the new incarnation of the method on 143 protein family/structure-pairs from the Protein Families database (PFAM), one of the larger tests of this class of algorithms to date.

Magnus Ekeberg | Erik Aurell | Tuomo Hartonen | E. Aurell | T. Hartonen | M. Ekeberg

[1] Thomas A. Hopf,et al. Protein 3D Structure Computed from Evolutionary Sequence Variation , 2011, PloS one.

[2] Muriel Médard,et al. Network deconvolution as a general method to distinguish direct dependencies in networks , 2013, Nature Biotechnology.

[3] Sivaraman Balakrishnan,et al. Learning generative models for protein fold families , 2011, Proteins.

[4] Zhiyong Wang,et al. Predicting protein contact map using evolutionary and physical constraints by integer programming , 2013, Bioinform..

[5] Olivier Rivoire. Elements of coevolution in biological sequences. , 2013, Physical review letters.

[6] R. Monasson,et al. Inference of Hopfield-Potts patterns from covariation in protein families: calculation and statistical error bars , 2013 .

[7] Gregory A.Petsko and Dagmar Ringe. Protein structure and function , 2003 .

[8] David L. Wild,et al. Predicting protein β-sheet contacts using a maximum entropy-based correlated mutation measure , 2013, Bioinform..

[9] Sheng Wang,et al. Protein contact prediction by joint evolutionary coupling analysis across multiple families , 2013, ArXiv.

[10] Simona Cocco,et al. From Principal Component to Direct Coupling Analysis of Coevolution in Proteins: Low-Eigenvalue Modes are Needed for Structure Prediction , 2012, PLoS Comput. Biol..

[11] Carlo Baldassi,et al. Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners , 2014, PloS one.

[12] Thomas A. Hopf,et al. Sequence co-evolution gives 3D contacts and structures of protein complexes , 2014, eLife.

[13] C. Chothia,et al. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[14] D. Baker,et al. Assessing the utility of coevolution-based residue–residue contact predictions in a sequence- and structure-rich era , 2013, Proceedings of the National Academy of Sciences.

[15] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[16] T. Hwa,et al. Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[17] E. Aurell,et al. Inverse Ising inference using all the data. , 2011, Physical review letters.

[18] Piero Fariselli,et al. BCov: a method for predicting β-sheet topology using sparse inverse covariance estimation and integer programming , 2013, Bioinform..

[19] C. Sander,et al. Correlated mutations and residue contacts in proteins , 1994, Proteins.

[20] W. P. Russ,et al. Natural-like function in artificial WW domains , 2005, Nature.

[21] T. N. Bhat,et al. The Protein Data Bank , 2000, Nucleic Acids Res..

[22] C. Sander,et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[23] J. Besag. Statistical Analysis of Non-Lattice Data , 1975 .

[24] Terence Hwa,et al. Inference of direct residue contacts in two-component signaling. , 2010, Methods in enzymology.

[25] A G Murzin,et al. SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[26] Michael I. Jordan. Graphical Models , 2003 .

[27] E. Birney,et al. Pfam: the protein families database , 2013, Nucleic Acids Res..

[28] Zhiyong Wang,et al. Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning , 2013, Bioinform..

[29] R. Monasson,et al. Adaptive Cluster Expansion for the Inverse Ising Problem: Convergence, Algorithm and Tests , 2011, 1110.5416.

[30] B. O. Koopman. On distributions admitting a sufficient statistic , 1936 .

[31] Sanzo Miyazawa,et al. Prediction of Contact Residue Pairs Based on Co-Substitution between Sites in Protein Structures , 2013, PloS one.

[32] Guido Tiana,et al. The network of stabilizing contacts in proteins studied by coevolutionary data. , 2013, The Journal of chemical physics.

[33] F. Ricci-Tersenghi. The Bethe approximation for solving the inverse Ising problem: a comparison with other inference methods , 2011, 1112.4814.

[34] R. Monasson,et al. Small-correlation expansions for the inverse Ising problem , 2008, 0811.3574.

[35] E. Aurell,et al. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36] Massimiliano Pontil,et al. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[37] S. Balaji,et al. SUPFAM: A database of sequence superfamilies of protein domains , 2004, BMC Bioinformatics.

[38] Thomas A. Hopf,et al. Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[39] Marcin J. Skwark,et al. Improving Contact Prediction along Three Dimensions , 2014, PLoS Comput. Biol..

[40] Thomas A. Hopf,et al. Three-Dimensional Structures of Membrane Proteins from Genomic Sequencing , 2012, Cell.

[41] Erik van Nimwegen,et al. Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments , 2010, PLoS Comput. Biol..

[42] K Fidelis,et al. A large‐scale experiment to assess protein structure prediction methods , 1995, Proteins.

[43] Sameer Velankar,et al. E-MSD: an integrated data resource for bioinformatics , 2004, Nucleic Acids Res..

[44] Simona Cocco,et al. Adaptive Cluster Expansion for Inferring Boltzmann Machines with Noisy Data , 2011, Physical review letters.

[45] E. Pitman,et al. Sufficient statistics and intrinsic accuracy , 1936, Mathematical Proceedings of the Cambridge Philosophical Society.

[46] Gregory B. Gloor,et al. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[47] G. Stormo,et al. Correlated mutations in models of protein sequences: phylogenetic and structural effects , 1999 .

[48] Lubert Stryer,et al. Protein structure and function , 2005, Experientia.

[49] Marcin J. Skwark,et al. PconsC: combination of direct information methods and alignments improves contact prediction , 2013, Bioinform..

[50] Shuai Cheng Li,et al. Prediction of residue-residue contacts from protein families using similarity kernels and least squares regularization , 2013, 1311.1301.

[51] N D Clarke,et al. Covariation of residues in the homeodomain sequence family , 1995, Protein science : a publication of the Protein Society.

[52] E. Neher. How frequent are correlated changes in families of protein sequences? , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[53] Sreeurpa Ray,et al. The Cell: A Molecular Approach , 1996 .