On Representing Protein Folding Patterns Using Non-Linear Parametric Curves

Proteins fold into complex three-dimensional shapes. Simplified representations of their shapes are central to rationalise, compare, classify, and interpret protein structures. Traditional methods to abstract protein folding patterns rely on representing their standard secondary structural elements (helices and strands of sheet) using line segments. This results in ignoring a significant proportion of structural information. The motivation of this research is to derive mathematically rigorous and biologically meaningful abstractions of protein folding patterns that maximize the economy of structural description and minimize the loss of structural information. We report on a novel method to describe a protein as a non-overlapping set of parametric three dimensional curves of varying length and complexity. Our approach to this problem is supported by information theory and uses the statistical framework of minimum message length (MML) inference. We demonstrate the effectiveness of our non-linear abstraction to support efficient and effective comparison of protein folding patterns on a large scale.

[1]  A M Lesk,et al.  Systematic representation of protein folding patterns. , 1995, Journal of molecular graphics.

[2]  William R. Taylor,et al.  Analysis of the tertiary structure of protein β-sheet sandwiches , 1981 .

[3]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[4]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[5]  P. Røgen,et al.  Automatic classification of protein structure by using Gauss integrals , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Nick V. Grishin,et al.  ProSMoS server: a pattern-based search using interaction matrix representation of protein structures , 2009, Nucleic Acids Res..

[7]  C. Chothia,et al.  Helix to helix packing in proteins. , 1981, Journal of molecular biology.

[8]  W R Taylor,et al.  Defining linear segments in protein structure. , 2001, Journal of molecular biology.

[9]  C. S. Wallace,et al.  Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[10]  C. S. Wallace,et al.  Estimation and Inference by Compact Coding , 1987 .

[11]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[12]  Lloyd Allison,et al.  Minimum message length inference of secondary structure from protein coordinate data , 2012, Bioinform..

[13]  Peter J. Stuckey,et al.  Structural search and retrieval using a tableau representation of protein folding patterns , 2008, Bioinform..

[14]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[15]  Yu Wai Chen,et al.  The crystal structure of the ubiquitin-like (UbL) domain of human homologue A of Rad23 (hHR23A) protein. , 2011, Protein engineering, design & selection : PEDS.

[16]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[17]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[18]  A V Finkelstein,et al.  The classification and origins of protein folding patterns. , 1990, Annual review of biochemistry.

[19]  Arthur M. Lesk,et al.  Introduction to Protein Science: Architecture, Function, and Genomics , 2001 .

[20]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[21]  Peter J. Stuckey,et al.  Piecewise linear approximation of protein structures using the principle of minimum message length , 2011, Bioinform..

[22]  Lenore Cowen,et al.  Matt: Local Flexibility Aids Protein Multiple Structure Alignment , 2008, PLoS Comput. Biol..

[23]  N. Colloc'h,et al.  Comparison of three algorithms for the assignment of secondary structure in proteins: the advantages of a consensus assignment. , 1993, Protein engineering.