Predicting FAD Interacting Residues with Feature Selection and Comprehensive Sequence Descriptors

The function of a flavoprotein is determined to a great extent by the binding sites on its surface that interacts with flavin adenine dinucleotide (FAD). Malfunction or dysregulation of FAD binding leads to a series of diseases. Therefore, accurately identifying FAD interacting residues (FIRs) provides insights into the molecular mechanisms of flavoprotein-related biological processes and disease progression. In this paper, a new computational method is proposed for identifying FIRs from protein sequences. Various sequence-derived discriminative features are explored. We analyze the distinctions of these features between FIRs and non-FIRs. We also investigate the predictive capabilities of both individual features and combinations of features. A relief algorithm followed by incremental feature selection (relief-IFS) is then adopted to search the optimal features. Finally, a random forest (RF) module is used to predict FIRs based on the optimal features. Using a 5-fold cross-validation test, the proposed method performs well, with a sensitivity of 0.847, a specificity of 0.933, an accuracy of 0.890, and a Matthews correlation coefficient (MCC) of 0.782, thereby outperforming previous methods. These results indicate that our method is relatively successful at predicting FIRs.

[1]  F. Quiocho Protein-carbohydrate interactions: basic molecular features , 1989 .

[2]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[3]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[4]  H. Yamana,et al.  Prediction of FAD Binding Residues with Combined Features from Primary Sequence , 2022 .

[5]  M. Nishio,et al.  The CH/π hydrogen bond in chemistry. Conformation, supramolecules, optical resolution and interactions involving carbohydrates. , 2011, Physical chemistry chemical physics : PCCP.

[6]  Daniel Schwartz,et al.  Biological sequence motif discovery using motif-x. , 2011, Current protocols in bioinformatics.

[7]  H. Waterham,et al.  Flavin Adenine Dinucleotide Status and the Effects of High-Dose Riboflavin Treatment in Short-Chain Acyl-CoA Dehydrogenase Deficiency , 2010, Pediatric Research.

[8]  Jianzhao Gao,et al.  An ensemble method for prediction of conformational B-cell epitopes from antigen sequences , 2014, Comput. Biol. Chem..

[9]  Yaoqi Zhou,et al.  Improving the prediction accuracy of residue solvent accessibility and real‐value backbone torsion angles of proteins by guided‐learning through a two‐layer neural network , 2009, Proteins.

[10]  Melvin A. Park,et al.  Flavin Adenine Dinucleotide Structural Motifs: From Solution to Gas Phase , 2014, Analytical chemistry.

[11]  L. Vergani,et al.  Biosynthesis of flavin cofactors in man: implications in health and disease. , 2013, Current pharmaceutical design.

[12]  Geoffrey I. Webb,et al.  Cascleave: towards more accurate prediction of caspase substrate cleavage sites , 2010, Bioinform..

[13]  Parviz Abdolmaleki,et al.  Predictions of Protein-Protein Interfaces within Membrane Protein Complexes , 2013, Avicenna journal of medical biotechnology.

[14]  Jianjun Hu,et al.  DNABind: A hybrid algorithm for structure‐based prediction of DNA‐binding residues by combining machine learning‐ and template‐based approaches , 2013, Proteins.

[15]  Pierre Baldi,et al.  SOLpro: accurate sequence-based prediction of protein solubility , 2009, Bioinform..

[16]  Wei Chen,et al.  Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. , 2012, Journal of proteomics.

[17]  Xiaolong Wang,et al.  Using distances between Top-n-gram and residue pairs for protein remote homology detection , 2014, BMC Bioinformatics.

[18]  Vladimir Vacic,et al.  Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments , 2006, Bioinform..

[19]  S. Stockham,et al.  Methemoglobinemia and Eccentrocytosis in Equine Erythrocyte Flavin Adenine Dinucleotide Deficiency , 2003, Veterinary pathology.

[20]  C. Thibodeaux,et al.  The diverse roles of flavin coenzymes--nature's most versatile thespians. , 2007, The Journal of organic chemistry.

[21]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[22]  Jaime G. Carbonell,et al.  Active learning for human protein-protein interaction prediction , 2010, BMC Bioinformatics.

[23]  Gajendra P. S. Raghava,et al.  Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information , 2010, BMC Bioinformatics.

[24]  S. Khan,et al.  Prediction of protein structure classes using hybrid space of multi-profile Bayes and bi-gram probability feature spaces. , 2014, Journal of theoretical biology.

[25]  M. Gromiha,et al.  Real value prediction of solvent accessibility from amino acid sequence , 2003, Proteins.

[26]  Juliana S Bernardes,et al.  A review of protein function prediction under machine learning perspective. , 2013, Recent patents on biotechnology.

[27]  T. Petersen,et al.  A generic method for assignment of reliability scores applied to solvent accessibility predictions , 2009, BMC Structural Biology.

[28]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[29]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[30]  Yu-Chu Tian,et al.  An Ensemble Method for Predicting Subnuclear Localizations from Primary Protein Structures , 2013, PloS one.

[31]  Qian-Zhong Li,et al.  Annotating the protein-RNA interaction sites in proteins using evolutionary information and protein backbone structure. , 2012, Journal of theoretical biology.

[32]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[33]  P. Suganthan,et al.  SPRED: A machine learning approach for the identification of classical and non-classical secretory proteins in mammalian genomes. , 2010, Biochemical and biophysical research communications.

[34]  M. Griffith,et al.  Antifreeze proteins and their potential use in frozen foods. , 1995, Biotechnology advances.

[35]  Ganapati Panda,et al.  A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction , 2010, Comput. Biol. Chem..

[36]  Xing-Ming Zhao,et al.  Prediction of S-Glutathionylation Sites Based on Protein Sequences , 2013, PloS one.

[37]  Jianjun Hu,et al.  HemeBIND: a novel method for heme binding residue prediction by combining structural and sequence information , 2011, BMC Bioinformatics.

[38]  Shao-Ping Shi,et al.  A method to distinguish between lysine acetylation and lysine methylation from protein sequences. , 2012, Journal of theoretical biology.

[39]  K. Yura,et al.  Conformational behavior of flavin adenine dinucleotide: conserved stereochemistry in bound and free states. , 2014, The journal of physical chemistry. B.

[40]  B. Kappes,et al.  Flavogenomics – a genomic and structural view of flavin‐dependent proteins , 2011, The FEBS journal.

[41]  Runtao Yang,et al.  An Ensemble Method with Hybrid Features to Identify Extracellular Matrix Proteins , 2015, PloS one.

[42]  Rita Casadio,et al.  Thermodynamics of binding of regulatory ligands to tissue transglutaminase , 2010, Amino Acids.

[43]  Dong Xu,et al.  Computational Identification of Protein Methylation Sites through Bi-Profile Bayes Feature Extraction , 2009, PloS one.

[44]  M. Muraki The importance of CH/pi interactions to the function of carbohydrate binding proteins. , 2002, Protein and peptide letters.

[45]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[47]  P. Macheroux,et al.  The human flavoproteome , 2013, Archives of biochemistry and biophysics.

[48]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[49]  Hui Ding,et al.  AcalPred: A Sequence-Based Tool for Discriminating between Acidic and Alkaline Enzymes , 2013, PloS one.

[50]  Xiang-tao Li,et al.  Prediction of Lysine Ubiquitylation with Ensemble Classifier and Feature Selection , 2011, International journal of molecular sciences.

[51]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.