Identifying short disorder-to-order binding regions in disordered proteins with a deep convolutional neural network method

Molecular recognition features (MoRFs) are key functional regions of intrinsically disordered proteins (IDPs), which play important roles in the molecular interaction network of cells and are implicated in many serious human diseases. Identifying MoRFs is essential for both functional studies of IDPs and drug design. This study adopts the cutting-edge machine learning method of artificial intelligence to develop a powerful model for improving MoRFs prediction. We proposed a method, named as en_DCNNMoRF (ensemble deep convolutional neural network-based MoRF predictor). It combines the outcomes of two independent deep convolutional neural network (DCNN) classifiers that take advantage of different features. The first, DCNNMoRF1, employs position-specific scoring matrix (PSSM) and 22 types of amino acid-related factors to describe protein sequences. The second, DCNNMoRF2, employs PSSM and 13 types of amino acid indexes to describe protein sequences. For both single classifiers, DCNN with a novel two-dimensional attention mechanism was adopted, and an average strategy was added to further process the output probabilities of each DCNN model. Finally, en_DCNNMoRF combined the two models by averaging their final scores. When compared with other well-known tools applied to the same datasets, the accuracy of the novel proposed method was comparable with that of state-of-the-art methods. The related web server can be accessed freely via http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/en_MoRFs.php .

[1]  Hayato Yamana,et al.  MFSPSSMpred: identifying short disorder-to-order binding regions in disordered proteins based on contextual local evolutionary conservation , 2013, BMC Bioinformatics.

[2]  Jens Meiler,et al.  Generation and evaluation of dimension-reduced amino acid parameter representations by artificial neural networks , 2001 .

[3]  Jörg Gsponer,et al.  Computational identification of MoRFs in protein sequences , 2015, Bioinform..

[4]  A. Dunker,et al.  Retro-MoRFs: Identifying Protein Binding Sites by Normal and Reverse Alignment and Intrinsic Disorder Prediction , 2010, International journal of molecular sciences.

[5]  Loris Nanni,et al.  Ensemblator: An ensemble of classifiers for reliable classification of biological data , 2007, Pattern Recognit. Lett..

[6]  Marc S. Cortese,et al.  Analysis of molecular recognition features (MoRFs). , 2006, Journal of molecular biology.

[7]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[8]  Yaoqi Zhou,et al.  Improving protein disorder prediction by deep bidirectional long short‐term memory recurrent neural networks , 2016, Bioinform..

[9]  Marc S. Cortese,et al.  Coupled folding and binding with α-helix-forming molecular recognition elements , 2005 .

[10]  István Simon,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm035 Structural bioinformatics Local structural disorder imparts plasticity on linear motifs , 2022 .

[11]  Hong-Bin Shen,et al.  RNA-protein binding motifs mining with a new hybrid deep learning based cross-domain knowledge integration approach , 2016, BMC Bioinformatics.

[12]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[13]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[14]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[15]  Ronesh Sharma,et al.  Predicting MoRFs in protein sequences using HMM profiles , 2016, BMC Bioinformatics.

[16]  A. Fersht,et al.  Structure of tumor suppressor p53 and its intrinsically disordered N-terminal transactivation domain , 2008, Proceedings of the National Academy of Sciences.

[17]  H. Scheraga,et al.  Statistical analysis of the physical properties of the 20 naturally occurring amino acids , 1985 .

[18]  Lukasz A. Kurgan,et al.  MoRFpred, a computational tool for sequence-based prediction and characterization of short disorder-to-order transitioning binding regions in proteins , 2012, Bioinform..

[19]  Georgios N Tsaousis,et al.  Analysis of Molecular Recognition Features (MoRFs) in membrane proteins. , 2013, Biochimica et biophysica acta.

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[21]  A Keith Dunker,et al.  Characterization of molecular recognition features, MoRFs, and their binding partners. , 2007, Journal of proteome research.

[22]  Yaoqi Zhou,et al.  Intrinsically Semi-disordered State and Its Role in Induced Folding and Protein Aggregation , 2013, Cell Biochemistry and Biophysics.

[23]  A. Keith Dunker,et al.  Mining α-Helix-Forming Molecular Recognition Features with Cross Species Sequence Alignments† , 2007 .

[24]  W. Atchley,et al.  Solving the protein sequence metric problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Christopher J. Oldfield,et al.  Intrinsically disordered proteins in human diseases: introducing the D2 concept. , 2008, Annual review of biophysics.

[26]  Zsuzsanna Dosztányi,et al.  ANCHOR: web server for predicting protein binding regions in disordered proteins , 2009, Bioinform..

[27]  Yaoqi Zhou,et al.  Intrinsic Disorder and Semi-disorder Prediction by SPINE-D. , 2016, Methods in molecular biology.

[28]  Toby J. Gibson,et al.  Evidence for the Concerted Evolution between Short Linear Protein Motifs and Their Flanking Regions , 2009, PloS one.

[29]  Ronesh Sharma,et al.  MoRFPred-plus: Computational Identification of MoRFs in Protein Sequences using Physicochemical Properties and HMM profiles. , 2018, Journal of theoretical biology.