Estimation of Position Specific Energy as a Feature of Protein Residues from Sequence Alone for Structural Classification

A set of features computed from the primary amino acid sequence of proteins, is crucial in the process of inducing a machine learning model that is capable of accurately predicting three-dimensional protein structures. Solutions for existing protein structure prediction problems are in need of features that can capture the complexity of molecular level interactions. With a view to this, we propose a novel approach to estimate position specific estimated energy (PSEE) of a residue using contact energy and predicted relative solvent accessibility (RSA). Furthermore, we demonstrate PSEE can be reasonably estimated based on sequence information alone. PSEE is useful in identifying the structured as well as unstructured or, intrinsically disordered region of a protein by computing favorable and unfavorable energy respectively, characterized by appropriate threshold. The most intriguing finding, verified empirically, is the indication that the PSEE feature can effectively classify disorder versus ordered residues and can segregate different secondary structure type residues by computing the constituent energies. PSEE values for each amino acid strongly correlate with the hydrophobicity value of the corresponding amino acid. Further, PSEE can be used to detect the existence of critical binding regions that essentially undergo disorder-to-order transitions to perform crucial biological functions. Towards an application of disorder prediction using the PSEE feature, we have rigorously tested and found that a support vector machine model informed by a set of features including PSEE consistently outperforms a model with an identical set of features with PSEE removed. In addition, the new disorder predictor, DisPredict2, shows competitive performance in predicting protein disorder when compared with six existing disordered protein predictors.

[1]  Liam J. McGuffin,et al.  Intrinsic disorder prediction from the analysis of multiple protein fold recognition models , 2008, Bioinform..

[2]  A. Dunker,et al.  Understanding protein non-folding. , 2010, Biochimica et biophysica acta.

[3]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[4]  Sumaiya Iqbal,et al.  Improved protein disorder predictor by smoothing output , 2014, 2014 17th International Conference on Computer and Information Technology (ICCIT).

[5]  James G. Lyons,et al.  A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. , 2013, Journal of theoretical biology.

[6]  Kengo Kinoshita,et al.  PrDOS: prediction of disordered protein regions from amino acid sequence , 2007, Nucleic Acids Res..

[7]  Yaoqi Zhou,et al.  Improving the prediction accuracy of residue solvent accessibility and real‐value backbone torsion angles of proteins by guided‐learning through a two‐layer neural network , 2009, Proteins.

[8]  Z. Obradovic,et al.  Identification and functions of usefully disordered proteins. , 2002, Advances in protein chemistry.

[9]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[10]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[11]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[12]  P. Tompa,et al.  The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. , 2005, Journal of molecular biology.

[13]  Jianlin Cheng,et al.  DNdisorder: predicting protein disorder using boosting and deep networks , 2013, BMC Bioinformatics.

[14]  H. Dyson,et al.  Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. , 1999, Journal of molecular biology.

[15]  Christopher J. Oldfield,et al.  Intrinsic disorder and functional proteomics. , 2007, Biophysical journal.

[16]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[17]  Christopher J. Oldfield,et al.  Showing your ID: intrinsic disorder as an ID for recognition, regulation and cell signaling , 2005, Journal of molecular recognition : JMR.

[18]  David T. Jones,et al.  DISOPRED3: precise disordered region predictions with annotated protein-binding activity , 2014, Bioinform..

[19]  A Keith Dunker,et al.  SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method , 2012, Journal of biomolecular structure & dynamics.

[20]  Sumaiya Iqbal,et al.  Improved prediction of accessible surface area results in efficient energy function application. , 2015, Journal of theoretical biology.

[21]  Anna Tramontano,et al.  Assessment of protein disorder region predictions in CASP10 , 2014, Proteins.

[22]  Zsuzsanna Dosztányi,et al.  Prediction of Protein Binding Regions in Disordered Proteins , 2009, PLoS Comput. Biol..

[23]  E. Fischer Einfluss der Configuration auf die Wirkung der Enzyme , 1894 .

[24]  Zoran Obradovic,et al.  DisProt: the Database of Disordered Proteins , 2006, Nucleic Acids Res..

[25]  Antony Le Béchec,et al.  AMYPdb: A database dedicated to amyloid precursor proteins , 2008, BMC Bioinformatics.

[26]  K. Dill,et al.  An iterative method for extracting energy-like quantities from protein structures. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[27]  A Keith Dunker,et al.  Intrinsic disorder and protein function. , 2002, Biochemistry.

[28]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[29]  Zoran Obradovic,et al.  The protein trinity—linking function and disorder , 2001, Nature Biotechnology.

[30]  Austin G. Meyer,et al.  Maximum Allowed Solvent Accessibilites of Residues in Proteins , 2012, PloS one.

[31]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[32]  Lubert Stryer,et al.  Protein structure and function , 2005, Experientia.

[33]  Yaoqi Zhou,et al.  sDFIRE: Sequence‐specific statistical energy function for protein structure prediction by decoy selections , 2016, J. Comput. Chem..

[34]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[35]  Sumaiya Iqbal,et al.  DisPredict: A Predictor of Disordered Protein Using Optimized RBF Kernel , 2015, PloS one.

[36]  Lukasz A. Kurgan,et al.  Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources , 2010, Bioinform..

[37]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[38]  Janusz M. Bujnicki,et al.  MetaDisorder: a meta-server for the prediction of intrinsic disorder in proteins , 2012, BMC Bioinformatics.

[39]  V. Uversky,et al.  Why are “natively unfolded” proteins unstructured under physiologic conditions? , 2000, Proteins.

[40]  Sumaiya Iqbal,et al.  A balanced secondary structure predictor. , 2016, Journal of theoretical biology.

[41]  Lukasz Kurgan,et al.  MFDp2: Accurate predictor of disorder in proteins by fusion of disorder probabilities, content and profiles. , 2013, Intrinsically disordered proteins.

[42]  P. Tompa Intrinsically unstructured proteins. , 2002, Trends in biochemical sciences.

[43]  Jaime Prilusky,et al.  Assessment of disorder predictions in CASP8 , 2009, Proteins.

[44]  Kengo Kinoshita,et al.  Prediction of disordered regions in proteins based on the meta approach , 2008, Bioinform..

[45]  H. Dyson,et al.  Coupling of folding and binding for unstructured proteins. , 2002, Current opinion in structural biology.

[46]  Silvio C. E. Tosatto,et al.  ESpritz: accurate and fast prediction of protein disorder , 2012, Bioinform..

[47]  Anna Tramontano,et al.  Evaluation of disorder predictions in CASP9 , 2011, Proteins.

[48]  Yaoqi Zhou,et al.  Fluctuations of backbone torsion angles obtained from NMR‐determined structures and their prediction , 2010, Proteins.

[49]  Gregory A.Petsko and Dagmar Ringe Protein structure and function , 2003 .

[50]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[51]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[52]  Fernanda L. Sirota,et al.  Parameterization of disorder predictors for large-scale applications requiring high specificity by using an extended benchmark dataset , 2010, BMC Genomics.

[53]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.