Feature Extraction of Protein Contact Maps from Protein 3D-Coordinates

This work mainly proposes an alternate way of solving challenging problems of computational biology like protein secondary structure assignment, protein fold identification/recognition, protein fold signatures, and contact map overlap problem by exploiting the idea that proteins belonging to the same protein fold have similar contact maps. Pattern mining of contact maps is conducted to extract features that pertain to fold information. Using the work in the literature that predicts contact maps from the primary amino acid sequence, we propose that using pattern features from predicted contact maps would lead to an Ab-Initio method. Hence, instead of extracting features from the primary amino acid sequence, we propose to extract pattern features from the protein contact maps. Protein secondary structure assignment is achieved with an accuracy of 76% on RS126 data set, on par with best of algorithms up to 10% of noise, and then the performance falls to 66% by 15% noise.

[1]  Andrew J. Miles,et al.  A reference database for circular dichroism spectroscopy covering fold and secondary structure space , 2006, Bioinform..

[2]  Burkhard Rost,et al.  Secondary structure assignment. , 2003, Methods of biochemical analysis.

[3]  Jaap Heringa,et al.  Local Structure Prediction of Proteins , 2007 .

[4]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[5]  Mohammed J. Zaki,et al.  Mining Protein Contact Maps , 2002, BIOKDD.

[6]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[7]  R. Srinivasan,et al.  A physical basis for protein secondary structure. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[9]  P. Y. Chou,et al.  Empirical predictions of protein conformation. , 1978, Annual review of biochemistry.

[10]  S M King,et al.  Assigning secondary structure from protein coordinate data , 1999, Proteins.

[11]  Lloyd Allison,et al.  Minimum message length inference of secondary structure from protein coordinate data , 2012, Bioinform..

[12]  George Karypis,et al.  Better Kernels and Coding Schemes Lead to Improvements in SVM-Based Secondary Structure Prediction , 2005 .

[13]  B. Rost,et al.  Redefining the goals of protein secondary structure prediction. , 1994, Journal of molecular biology.

[14]  Adam Prügel-Bennett,et al.  An evolutionary method for learning HMM structure: prediction of protein secondary structure , 2007, BMC Bioinformatics.