Cluster Protein Structures using Recurrence Quantification Analysis on Coordinates of Alpha-Carbon Atoms of Proteins

The 3-dimensional coordinates of alpha-carbon atoms of proteins are used to distinguish the protein structural classes based on recurrence quantification analysis (RQA). We consider two independent variables from RQA of coordinates of alpha-carbon atoms, %determ1 and %determ2, which were defined by Webber et al. [C.L. Webber Jr., A. Giuliani, J.P. Zbilut, A. Colosimo, Proteins Struct. Funct. Genet. 44 (2001) 292]. The variable %determ2 is used to define two new variables, %determ21 and %determ22. Then three variables %determ1, %determ21 and %determ22 are used to construct a 3-dimensional variable space. Each protein is represented by a point in this variable space. The points corresponding to proteins from the α, β, α+β and α/β structural classes position into different areas in this variable space. In order to give a quantitative assessment of our clustering on the selected proteins, Fisher's discriminant algorithm is used. Numerical results indicate that the discriminant accuracies are very high and satisfactory.

[1]  A Giuliani,et al.  Elucidating protein secondary structures using alpha‐carbon recurrence quantifications , 2001, Proteins.

[2]  C L Webber,et al.  Dynamical assessment of physiological systems and states using recurrence plot strategies. , 1994, Journal of applied physiology.

[3]  Robert Service A Dearth of New Folds , 2005, Science.

[4]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[5]  R Benigni,et al.  Nonlinear methods in the analysis of protein sequences: a case study in rubredoxins. , 2000, Biophysical journal.

[6]  G. Rose,et al.  Rigid domains in proteins: An algorithmic approach to their identification , 1995, Proteins.

[7]  Alessandro Giuliani,et al.  Recurrence quantification analysis as a tool for characterization of molecular dynamics simulations , 1998, physics/9806006.

[8]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[9]  David G. Stork,et al.  Pattern Classification , 1973 .

[10]  S. G. Anderson International Symposium on Interferon and Interferon Inducers , 1971 .

[11]  A. Giuliani,et al.  Recurrence quantification analysis in structure–function relationships of proteins: an overview of a general methodology applied to the case of TEM-1β-lactamase , 1998 .

[12]  G. Crooks,et al.  Protein secondary structure: entropy, correlations and prediction. , 2003, Bioinformatics.

[13]  A Giuliani,et al.  The role of hydrophobicity patterns in prion folding as revealed by recurrence quantification analysis of primary structure. , 2000, Protein engineering.

[14]  D. Ruelle,et al.  Recurrence Plots of Dynamical Systems , 1987 .

[15]  P. Røgen,et al.  Automatic classification of protein structure by using Gauss integrals , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[16]  K. Lau,et al.  Clustering of protein structures using hydrophobic free energy and solvent accessibility of proteins. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  Sung-Hou Kim,et al.  Global mapping of the protein structure space and application in structure-based inference of protein function. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[18]  M. Gromiha,et al.  Importance of long-range interactions in protein folding. , 1999, Biophysical chemistry.

[19]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[20]  F. Torrens Fractal Hybrid Orbitals Analysis of the Tertiary Structure of Protein Molecules† , 2002, Molecules : A Journal of Synthetic Chemistry and Natural Product Chemistry.

[21]  M. Karplus,et al.  Use of quantitative structure‐property relationships to predict the folding ability of model proteins , 1998, Proteins.

[22]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[23]  Phillips Dc,et al.  The development of crystallographic enzymology. , 1970 .

[24]  C. Chothia,et al.  Structural patterns in globular proteins , 1976, Nature.

[25]  D. Phillips The development of crystallographic enzymology. , 1970, Biochemical Society symposium.

[26]  A Giuliani,et al.  Recurrence quantification analysis in structure-function relationships of proteins: an overview of a general methodology applied to the case of TEM-1 beta-lactamase. , 1998, Protein engineering.

[27]  P. Argos,et al.  Relationships between protein sequence and structure patterns based on residue contacts , 1998, Proteins.

[28]  Zu-Guo Yu,et al.  Fractal Analysis of Measure Representation of Large Proteins Based on the Detailed HP Model , 2004 .

[29]  A Giuliani,et al.  Recurrence Quantification Analysis in Molecular Dynamics , 1999, Annals of the New York Academy of Sciences.

[30]  J. Zbilut,et al.  Embeddings and delays as derived from quantification of recurrence plots , 1992 .

[31]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[32]  Alfredo Colosimo,et al.  Nonlinear signal analysis methods in the elucidation of protein sequence-structure relationships. , 2002, Chemical reviews.