Analysis and prediction of presynaptic and postsynaptic neurotoxins by Chou's general pseudo amino acid composition and motif features.

Presynaptic neurotoxins and postsynaptic neurotoxins are two important neurotoxins isolated from venoms of venomous animals and have been proven to be potential effective in neurosciences and pharmacology. With the number of toxin sequences appeared in the public databases, there was a need for developing a computational method for fast and accurate identification and classification of the novel presynaptic neurotoxins and postsynaptic neurotoxins in the large databases. In this study, the Multinomial Naive Bayes Classifier (MNBC) had been developed to discriminate the presynaptic neurotoxins and postsynaptic neurotoxins based on the different kinds of features. The Minimum Redundancy Maximum Relevance (MRMR) feature selection method was used for ranking 400 pseudo amino acid (PseAA) compositions and 50 top ranked PseAA compositions were selected for improving the prediction results. The motif features, 400 PseAA compositions and 50 PseAA compositions were combined together, and selected as the input parameters of MNBC. The best correlation coefficient (CC) value of 0.8213 was obtained when the prediction quality was evaluated by the jackknife test. It was anticipated that the algorithm presented in this study may become a useful tool for identification of presynaptic neurotoxin and postsynaptic neurotoxin sequences and may provide some useful help for in-depth investigation into the biological mechanism of presynaptic neurotoxins and postsynaptic neurotoxins.

[1]  J. Halpert,et al.  Amino acid sequence of a postsynaptic neurotoxin from the venom of the Australian tiger snake Notechis scutatus scutatus. , 1979, Biochimie.

[2]  Xin Wang,et al.  PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[3]  Gajendra P. S. Raghava,et al.  BTXpred: Prediction of Bacterial Toxins , 2007, Silico Biol..

[4]  H. Mohabatkar,et al.  Analysis and comparison of lignin peroxidases between fungi and bacteria using three different modes of Chou's general pseudo amino acid composition. , 2016, Journal of theoretical biology.

[5]  James G. Lyons,et al.  Gram-positive and Gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou׳s general PseAAC. , 2015, Journal of theoretical biology.

[6]  David J. Craik,et al.  ConoServer: updated content, knowledge, and discovery tools in the conopeptide database , 2011, Nucleic Acids Res..

[7]  A. Armugam,et al.  Postsynaptic short-chain neurotoxins from Pseudonaja textilis. cDNA cloning, expression and protein characterization. , 2001, European journal of biochemistry.

[8]  K. Chou,et al.  Prediction of the aquatic toxicity of aromatic compounds to tetrahymena pyriformis through support vector regression , 2017, Oncotarget.

[9]  K. Chou,et al.  iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels , 2014, BioMed research international.

[10]  Kuo-Chen Chou,et al.  iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition , 2017, Oncotarget.

[11]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[12]  Dong Xu,et al.  iPhos‐PseEvo: Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into General PseAAC via Grey System Theory , 2017, Molecular informatics.

[13]  Kuo-Chen Chou,et al.  iPreny-PseAAC: Identify C-terminal Cysteine Prenylation Sites in Proteins by Incorporating Two Tiers of Sequence Couplings into PseAAC. , 2017, Medicinal chemistry (Shariqah (United Arab Emirates)).

[14]  B. Liu,et al.  Identification of Real MicroRNA Precursors with a Pseudo Structure Status Composition Approach , 2015, PloS one.

[15]  Guoli Wang,et al.  PISCES: recent improvements to a PDB sequence culling server , 2005, Nucleic Acids Res..

[16]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[17]  Kuo-Chen Chou,et al.  pLoc-mVirus: Predict subcellular localization of multi-location virus proteins via incorporating the optimal GO information into general PseAAC. , 2017, Gene.

[18]  Amos Bairoch,et al.  PROSITE, a protein domain database for functional characterization and annotation , 2009, Nucleic Acids Res..

[19]  Amos Bairoch,et al.  Tox-Prot, the toxin protein annotation program of the Swiss-Prot protein knowledgebase. , 2005, Toxicon : official journal of the International Society on Toxinology.

[20]  Greta J. Binford,et al.  ArachnoServer 2.0, an updated online resource for spider toxin sequences and structures , 2010, Nucleic Acids Res..

[21]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  S. Khan,et al.  Unb-DPC: Identify mycobacterial membrane protein types by incorporating un-biased dipeptide composition into Chou's general PseAAC. , 2017, Journal of theoretical biology.

[23]  Wei Chen,et al.  iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences , 2016, Oncotarget.

[24]  C. Montecucco,et al.  Presynaptic neurotoxins with enzymatic activities. , 2008, Handbook of experimental pharmacology.

[25]  Kuo-Chen Chou,et al.  2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function , 2017, Molecular therapy. Nucleic acids.

[26]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[27]  Yu-Dong Cai,et al.  Predicting N-terminal acetylation based on feature selection method. , 2008, Biochemical and biophysical research communications.

[28]  Shengli Zhang,et al.  Predict protein structural class by incorporating two different modes of evolutionary information into Chou's general pseudo amino acid composition. , 2017, Journal of molecular graphics & modelling.

[29]  Guoli Wang,et al.  PISCES: a protein sequence culling server , 2003, Bioinform..

[30]  K. Chou,et al.  iPGK-PseAAC: Identify Lysine Phosphoglycerylation Sites in Proteins by Incorporating Four Different Tiers of Amino Acid Pairwise Coupling Information into the General PseAAC. , 2017, Medicinal chemistry (Shariqah (United Arab Emirates)).

[31]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[32]  Pufeng Du,et al.  PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets , 2014, International journal of molecular sciences.

[33]  Gajendra P. S. Raghava,et al.  Prediction of Neurotoxins Based on Their Function and Source , 2007, Silico Biol..

[34]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[35]  K. Chou,et al.  iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. , 2017, Genomics.

[36]  Kuo-Chen Chou,et al.  An Unprecedented Revolution in Medicinal Chemistry Driven by the Progress of Biological Science. , 2017, Current topics in medicinal chemistry.

[37]  P. Suganthan,et al.  Prediction of functionally important sites from protein sequences using sparse kernel least squares classifiers. , 2009, Biochemical and biophysical research communications.

[38]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[39]  Y. Zhou,et al.  Crystal structure of agkistrodotoxin, a phospholipase A2-type presynaptic neurotoxin from agkistrodon halys pallas. , 1998, Journal of molecular biology.

[40]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[41]  K. Chou,et al.  iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins , 2013, PeerJ.

[42]  Ren Long,et al.  iRSpot-EL: identify recombination spots with an ensemble learning approach , 2017, Bioinform..

[43]  Michal Linial,et al.  ClanTox: a classifier of short animal toxins , 2009, Nucleic Acids Res..

[44]  J. Harris Snake venoms in science and clinical medicine. 3. Neuropharmacological aspects of the activity of snake venoms. , 1989, Transactions of the Royal Society of Tropical Medicine and Hygiene.

[45]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[46]  Kuo-Chen Chou,et al.  pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC. , 2017, Molecular bioSystems.

[47]  Hao Lin,et al.  Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. , 2007, Biochemical and biophysical research communications.

[48]  Kuo-Chen Chou,et al.  pLoc-mGpos: Incorporate Key Gene Ontology Information into General PseAAC for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins , 2017 .

[49]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[50]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[51]  K. Chou,et al.  Analysis and Prediction of the Metabolic Stability of Proteins Based on Their Sequential Features, Subcellular Locations and Interaction Networks , 2010, PloS one.

[52]  K. Chou,et al.  iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC , 2017, Molecular therapy. Nucleic acids.

[53]  P Gopalakrishnakone,et al.  Four new postsynaptic neurotoxins from Naja naja sputatrix venom: cDNA cloning, protein expression, and phylogenetic analysis. , 1998, Toxicon : official journal of the International Society on Toxinology.

[54]  Lei Yang,et al.  Prediction of presynaptic and postsynaptic neurotoxins by combining various Chou’s pseudo components , 2017, Scientific Reports.

[55]  C. Montecucco,et al.  How do presynaptic PLA2 neurotoxins block nerve terminals? , 2000, Trends in biochemical sciences.

[56]  Fan Yang,et al.  iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC , 2018, Bioinform..

[57]  Lei Yao,et al.  ATDB: a uni-database platform for animal toxins , 2007, Nucleic Acids Res..

[58]  Sukanta Mondal,et al.  Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. , 2006, Journal of theoretical biology.

[59]  Prabina Kumar Meher,et al.  Predicting antimicrobial peptides with improved accuracy by incorporating the compositional, physico-chemical and structural features into Chou’s general PseAAC , 2017, Scientific Reports.

[60]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[61]  Kuo-Chen Chou,et al.  iATC‐mISF: a multi‐label classifier for predicting the classes of anatomical therapeutic chemicals , 2016, Bioinform..

[62]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[63]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[64]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[65]  Kuo-Chen Chou,et al.  pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. , 2017, Genomics.

[66]  Kuo-Chen Chou,et al.  pLoc‐mAnimal: predict subcellular localization of animal proteins with both single and multiple sites , 2017, Bioinform..

[67]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[68]  Hong-Bin Shen,et al.  Conotoxin superfamily prediction using diffusion maps dimensionality reduction and subspace classifier. , 2011, Current protein & peptide science.

[69]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.

[70]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[71]  K. Chou,et al.  pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. , 2018, Genomics.

[72]  Yu-Dong Cai,et al.  Prediction of Protein Modification Sites of Pyrrolidone Carboxylic Acid Using mRMR Feature Selection and Analysis , 2011, PloS one.

[73]  Gholamreza Haffari,et al.  PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. , 2018, Journal of theoretical biology.

[74]  Qianzhong Li,et al.  Prediction of presynaptic and postsynaptic neurotoxins by the increment of diversity. , 2009, Toxicology in vitro : an international journal published in association with BIBRA.

[75]  J. Harris 2 Polypeptides from Snake Venoms which act on Nerve and Muscle , 1984 .