DeepCNF-D: Predicting Protein Order/Disorder Regions by Weighted Deep Convolutional Neural Fields

Intrinsically disordered proteins or protein regions are involved in key biological processes including regulation of transcription, signal transduction, and alternative splicing. Accurately predicting order/disorder regions ab initio from the protein sequence is a prerequisite step for further analysis of functions and mechanisms for these disordered regions. This work presents a learning method, weighted DeepCNF (Deep Convolutional Neural Fields), to improve the accuracy of order/disorder prediction by exploiting the long-range sequential information and the interdependency between adjacent order/disorder labels and by assigning different weights for each label during training and prediction to solve the label imbalance issue. Evaluated by the CASP9 and CASP10 targets, our method obtains 0.855 and 0.898 AUC values, which are higher than the state-of-the-art single ab initio predictors.

[1]  Jianlin Cheng,et al.  DNdisorder: predicting protein disorder using boosting and deep networks , 2013, BMC Bioinformatics.

[2]  Kengo Kinoshita,et al.  PrDOS: prediction of disordered protein regions from amino acid sequence , 2007, Nucleic Acids Res..

[3]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[4]  Janusz M. Bujnicki,et al.  MetaDisorder: a meta-server for the prediction of intrinsic disorder in proteins , 2012, BMC Bioinformatics.

[5]  Olga Russakovsky,et al.  Training Conditional Random Fields for Maximum Labelwise Accuracy , 2006, NIPS.

[6]  A Keith Dunker,et al.  Intrinsically disordered proteins and intrinsically disordered protein regions. , 2014, Annual review of biochemistry.

[7]  Anna Tramontano,et al.  Evaluation of disorder predictions in CASP9 , 2011, Proteins.

[8]  Zheng Wei-Mou,et al.  Fast Multiple Alignment of Protein Structures Using Conformational Letter Blocks , 2009 .

[9]  Sheng Wang,et al.  ClEPaps: Fast Pair Alignment of protein Structures Based on conformational Letters , 2007, J. Bioinform. Comput. Biol..

[10]  A. Dunker,et al.  Predicting intrinsic disorder in proteins: an overview , 2009, Cell Research.

[11]  A Keith Dunker,et al.  SPINE-D: Accurate Prediction of Short and Long Disordered Regions by a Single Neural-Network Based Method , 2012, Journal of biomolecular structure & dynamics.

[12]  Michel Verleysen,et al.  Weighted Conditional Random Fields for Supervised Interpatient Heartbeat Classification , 2012, IEEE Transactions on Biomedical Engineering.

[13]  Anna Tramontano,et al.  Assessment of protein disorder region predictions in CASP10 , 2014, Proteins.

[14]  Feng Zhao,et al.  Protein threading using context-specific alignment potential , 2013, Bioinform..

[15]  Louis Wehenkel,et al.  On the Encoding of Proteins for Disordered Regions Prediction , 2013, PloS one.

[16]  Yutaka Kuroda,et al.  POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions , 2007, Bioinform..

[17]  Jianlin Cheng,et al.  A comprehensive overview of computational protein disorder prediction methods. , 2012, Molecular bioSystems.

[18]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[19]  David T. Jones,et al.  DISOPRED3: precise disordered region predictions with annotated protein-binding activity , 2014, Bioinform..

[20]  B. Jirgensons,et al.  Optical rotation and viscosity of native and denatured proteins. VIII. Rotatory dispersion studies. , 1957, Archives of biochemistry and biophysics.

[21]  Lukasz A. Kurgan,et al.  SPINE X: Improving protein secondary structure prediction by multistep learning coupled with prediction of solvent accessible surface area and backbone torsion angles , 2012, J. Comput. Chem..

[22]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[23]  Avner Schlessinger,et al.  Natively unstructured regions in proteins identified from contact predictions , 2007, Bioinform..

[24]  M. Blackledge,et al.  Describing intrinsically disordered proteins at atomic resolution by NMR. , 2013, Current opinion in structural biology.

[25]  Jian Peng,et al.  Conditional Neural Fields , 2009, NIPS.

[26]  Zhiyong Wang,et al.  Protein 8-class secondary structure prediction using Conditional Neural Fields , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[27]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[28]  J. S. Sodhi,et al.  Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. , 2004, Journal of molecular biology.

[29]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[30]  Jianzhu Ma,et al.  AcconPred: Predicting Solvent Accessibility and Contact Number Simultaneously by a Multitask Learning Framework under the Conditional Neural Fields Model , 2015, BioMed research international.

[31]  Jian Peng,et al.  Alignment of distantly related protein structures: algorithm, bound and implications to homology modeling , 2011, Bioinform..

[32]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[33]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[34]  Jianzhu Ma,et al.  Algorithms, applications, and challenges of protein structure alignment. , 2014, Advances in protein chemistry and structural biology.

[35]  Roland L. Dunbrack,et al.  PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. , 2010, Biochimica et biophysica acta.

[36]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[37]  Rina Panigrahy,et al.  Sparse Matrix Factorization , 2013, ArXiv.

[38]  Sheng Wang,et al.  Protein Homology Detection Through Alignment of Markov Random Fields , 2015, SpringerBriefs in Computer Science.

[39]  Jian Peng,et al.  A conditional neural fields model for protein threading , 2012, Bioinform..

[40]  Pierre Baldi,et al.  Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data , 2005, Data Mining and Knowledge Discovery.

[41]  Lixiao Wang,et al.  OnD-CRF: predicting order and disorder in proteins conditional random fields , 2008, Bioinform..

[42]  B. Jirgensons,et al.  Optical rotation and viscosity of native and denatured proteins. X. Further studies on optical rotatory dispersion. , 1958, Archives of biochemistry and biophysics.

[43]  Christopher J. Oldfield,et al.  The unfoldomics decade: an update on intrinsically disordered proteins , 2008, BMC Genomics.

[44]  Zhiyong Wang,et al.  MRFalign: Protein Homology Detection through Alignment of Markov Random Fields , 2014, PLoS Comput. Biol..

[45]  Min Huang,et al.  Position‐specific residue preference features around the ends of helices and strands and a novel strategy for the prediction of secondary structures , 2008, Protein science : a publication of the Protein Society.

[46]  Silvio C. E. Tosatto,et al.  MobiDB: a comprehensive database of intrinsic protein disorder annotations , 2012, Bioinform..

[47]  Yen Hock Tan,et al.  Statistical potential‐based amino acid similarity matrices for aligning distantly related protein sequences , 2006, Proteins.

[48]  Jianzhu Ma,et al.  Protein structure alignment beyond spatial proximity , 2013, Scientific Reports.

[49]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[50]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..