Minimum message length inference of secondary structure from protein coordinate data

Motivation: Secondary structure underpins the folding pattern and architecture of most proteins. Accurate assignment of the secondary structure elements is therefore an important problem. Although many approximate solutions of the secondary structure assignment problem exist, the statement of the problem has resisted a consistent and mathematically rigorous definition. A variety of comparative studies have highlighted major disagreements in the way the available methods define and assign secondary structure to coordinate data. Results: We report a new method to infer secondary structure based on the Bayesian method of minimum message length inference. It treats assignments of secondary structure as hypotheses that explain the given coordinate data. The method seeks to maximize the joint probability of a hypothesis and the data. There is a natural null hypothesis and any assignment that cannot better it is unacceptable. We developed a program SST based on this approach and compared it with popular programs, such as DSSP and STRIDE among others. Our evaluation suggests that SST gives reliable assignments even on low-resolution structures. Availability: http://www.csse.monash.edu.au/~karun/sst Contact: arun.konagurthu@monash.edu (or lloyd.allison@monash.edu)

[1]  J. Gibrat,et al.  Protein secondary structure assignment revisited: a detailed analysis of different assignment methods , 2005, BMC Structural Biology.

[2]  Burkhard Rost,et al.  Secondary structure assignment. , 2003, Methods of biochemical analysis.

[3]  Arthur M Lesk,et al.  Contact patterns between helices and strands of sheet define protein folding patterns , 2007, Proteins.

[4]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[5]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[6]  C. S. Wallace,et al.  Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[7]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[8]  W R Taylor,et al.  Defining linear segments in protein structure. , 2001, Journal of molecular biology.

[9]  Jean-François Sadoc,et al.  Protein secondary structure assignment through Voronoï tessellation , 2004, Proteins.

[10]  S. F.R.,et al.  An Essay towards solving a Problem in the Doctrine of Chances . By the late Rev . Mr . Bayes , communicated by Mr . Price , in a letter to , 1999 .

[11]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[12]  S M King,et al.  Assigning secondary structure from protein coordinate data , 1999, Proteins.

[13]  F. Richards,et al.  Identification of structural motifs from protein coordinate data: Secondary structure and first‐level supersecondary structure * , 1988, Proteins.

[14]  Peter J. Stuckey,et al.  Piecewise linear approximation of protein structures using the principle of minimum message length , 2011, Bioinform..

[15]  S. Kearsley On the orthogonal transformation used for structural comparisons , 1989 .

[16]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[17]  Robert W. Janes,et al.  2Struc: the secondary structure server , 2010, Bioinform..

[18]  L. Pauling,et al.  Configurations of Polypeptide Chains With Favored Orientations Around Single Bonds: Two New Pleated Sheets. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[19]  R. Lavery,et al.  Describing protein structure: A general algorithm yielding complete helicoidal parameters and a unique overall axis , 1989, Proteins.

[20]  N. Colloc'h,et al.  Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment. , 1993, Protein engineering.

[21]  E. B. Andersen,et al.  Information Science and Statistics , 1986 .

[22]  R. Srinivasan,et al.  A physical basis for protein secondary structure. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Peter J. Stuckey,et al.  Structural search and retrieval using a tableau representation of protein folding patterns , 2008, Bioinform..

[24]  Charles F. Hockett,et al.  A mathematical theory of communication , 1948, MOCO.

[25]  S. Al-Karadaghi,et al.  Occurrence, conformational features and amino acid propensities for the pi-helix. , 2002, Protein engineering.

[26]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[27]  M. Levitt,et al.  Automatic identification of secondary structure in globular proteins. , 1977, Journal of molecular biology.

[28]  N. Sloane,et al.  On the Voronoi Regions of Certain Lattices , 1984 .

[29]  A M Lesk,et al.  Computer-generated schematic diagrams of protein structures. , 1982, Science.

[30]  Joël Pothier,et al.  P-SEA: a new efficient assignment of secondary structure from C alpha trace of proteins , 1997, Comput. Appl. Biosci..

[31]  Nick V. Grishin,et al.  PALSSE: A program to delineate linear secondary structural elements from protein structures , 2005, BMC Bioinformatics.

[32]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[33]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[34]  Barry Robson,et al.  Introduction to proteins and protein engineering , 1986 .

[35]  A Keith Dunker,et al.  Assessing secondary structure assignment of protein structures by using pairwise sequence‐alignment benchmarks , 2008, Proteins.

[36]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .