Using the SMOTE technique and hybrid features to predict the types of ion channel-targeted conotoxins.

Conotoxins targeting different ion channels play distinct physiological functions and therapeutic potentials in organisms. Accurate identification of types of ion channel-targeted conotoxins will provide significant clues to reveal the physiological mechanism and pharmacological therapeutic potential of conotoxins. In this study, a random forest based predictor called ICTCPred for the types of ion channel-targeted conotoxin prediction is proposed with hybrid features incorporating CTD (Composition, Transition, and Distribution), g-Gap DC (g-Gap Dipeptide Composition), PP (Physicochemical Properties), and SSI (Secondary Structure Information). To deal with the imbalanced benchmark dataset, the SMOTE Technique (Synthetic Minority Over-sampling Technique) is applied. Based on the above-mentioned individual feature spaces, the average accuracy of ICTCPred lies in the range of 0.729-0.886, indicating the discriminative power of these features. In addition, ICTCPred yields the highest average accuracy of 0.895 using the hybrid feature space of CTD, g-Gap DC, PP and SSI. The Relief-IFS (Incremental Feature Selection) method is adopted to further improve the prediction performance of ICTCPred. Based on the training dataset, ICTCPred achieves satisfactory performance with an average accuracy of 0.910. To evaluate the prediction performance objectively, ICTCPred is compared with previous studies on the same independent testing dataset. Encouragingly, our proposed method performs better than previous studies to identify types of ion channel-targeted conotoxins, with the highest sensitivity of 0.919 for Na(+)-targeted conotoxins, the highest sensitivity of 1 for K(+)-targeted conotoxins, and the highest sensitivity of 1 for Ca(2+)-targeted conotoxins. It is anticipated that ICTCPred can be a potential candidate for the ion channel-targeted conotoxin prediction.

[1]  Wei Chen,et al.  Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. , 2015, Molecular bioSystems.

[2]  K. Chou,et al.  iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels , 2014, BioMed research international.

[3]  Shun Shimohama,et al.  Nicotinic receptor-mediated neuroprotection in neurodegenerative disease models. , 2009, Biological & pharmaceutical bulletin.

[4]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[5]  Kumardeep Chaudhary,et al.  Cell Penetrating Peptides , 2016 .

[6]  Michele Magrane,et al.  UniProt Knowledgebase: a hub of integrated protein data , 2011, Database J. Biol. Databases Curation.

[7]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[8]  Vladimir B. Bajic,et al.  Conotoxins that Confer Therapeutic Possibilities , 2012, Marine drugs.

[9]  H. Ding,et al.  Identification of mitochondrial proteins of malaria parasite using analysis of variance , 2014, Amino Acids.

[10]  Gerardo Corzo,et al.  A Conus regularis Conotoxin with a Novel Eight-Cysteine Framework Inhibits CaV2.2 Channels and Displays an Anti-Nociceptive Activity , 2013, Marine drugs.

[11]  Wei Chen,et al.  Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. , 2012, Journal of proteomics.

[12]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[13]  Michael Gribskov,et al.  Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching , 1996, Comput. Chem..

[14]  David J Craik,et al.  Chemical modification of conotoxins to improve stability and activity. , 2007, ACS chemical biology.

[15]  Bogdan Gabrys,et al.  Classifier selection for majority voting , 2005, Inf. Fusion.

[16]  W. Li,et al.  Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation , 2011, Int. J. Approx. Reason..

[17]  K. Chandy,et al.  Ion channels in the immune system as targets for immunosuppression. , 1997, Current opinion in biotechnology.

[18]  K. Chou,et al.  iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. , 2013, Analytical biochemistry.

[19]  Yair Neuman The Definition of Life and the Life of a Definition , 2012, Journal of biomolecular structure & dynamics.

[20]  Jian Li,et al.  Iterative RELIEF for feature weighting , 2006, ICML.

[21]  Usa Chaikledkaew,et al.  Advanced health biotechnologies in Thailand: redefining policy directions , 2012, Journal of Translational Medicine.

[22]  Miljanich Gp,et al.  Ziconotide: neuronal calcium channel blocker for treating severe chronic pain. , 2004 .

[23]  Shao-Ping Shi,et al.  A method to distinguish between lysine acetylation and lysine methylation from protein sequences. , 2012, Journal of theoretical biology.

[24]  Xue-wen Chen,et al.  Sequence-based prediction of protein interaction sites with an integrative method , 2009, Bioinform..

[25]  K. Wilcox,et al.  The effect of CGX-1007 and CI-1041, novel NMDA receptor antagonists, on NMDA receptor-mediated EPSCs , 2004, Epilepsy Research.

[26]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[27]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[28]  Zhen Ji,et al.  Prediction of protein-protein interactions from amino acid sequences using a novel multi-scale continuous and discontinuous feature set , 2014, BMC Bioinformatics.

[29]  J. Boorman,et al.  Voltage-gated sodium channels and pain pathways. , 2004, Journal of neurobiology.

[30]  Jian Huang,et al.  Prediction of Golgi-resident protein types by using feature selection technique , 2013 .

[31]  B. Olivera,et al.  Diversity of the neurotoxic Conus peptides: a model for concerted pharmacological discovery. , 2007, Molecular interventions.

[32]  Yan Huang,et al.  Predicting protein-ATP binding sites from primary sequence through fusing bi-profile sampling of multi-view features , 2012, BMC Bioinformatics.

[33]  Richard J Lewis Conotoxins as selective inhibitors of neuronal ion channels, receptors and transporters , 2004, IUBMB life.

[34]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[35]  Norelle C. Wildburger,et al.  Neuroprotective effects of blockers for T-type calcium channels , 2009, Molecular Neurodegeneration.

[36]  Norelle L Daly,et al.  Structural studies of conotoxins , 2009, IUBMB life.

[37]  Hui Ding,et al.  Prediction of the types of ion channel-targeted conotoxins based on radial basis function network. , 2013, Toxicology in vitro : an international journal published in association with BIBRA.

[38]  Shinn-Ying Ho,et al.  Computational identification of ubiquitylation sites from protein sequences , 2008, BMC Bioinformatics.

[39]  David J. Craik,et al.  Conotoxins and their potential pharmaceutical applications , 1999 .

[40]  Hua Tang,et al.  Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique. , 2016, Molecular bioSystems.

[41]  Hui Ding,et al.  Prediction of protein structural classes based on feature selection technique , 2014, Interdisciplinary Sciences: Computational Life Sciences.

[42]  Heike Wulff,et al.  International Union of Pharmacology. LIII. Nomenclature and Molecular Relationships of Voltage-Gated Potassium Channels , 2005, Pharmacological Reviews.

[43]  Hui Ding,et al.  AcalPred: A Sequence-Based Tool for Discriminating between Acidic and Alkaline Enzymes , 2013, PloS one.

[44]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[45]  B. Olivera,et al.  Conotoxins, in retrospect. , 2001, Toxicon : official journal of the International Society on Toxinology.

[46]  Hui Ding,et al.  The prediction of protein structural class using averaged chemical shifts , 2012, Journal of biomolecular structure & dynamics.

[47]  B. Olivera,et al.  Conus venoms: a rich source of novel ion channel-targeted peptides. , 2004, Physiological reviews.

[48]  Shengli Zhang,et al.  Improving the prediction accuracy of protein structural class: approached with alternating word frequency and normalized Lempel-Ziv complexity. , 2014, Journal of theoretical biology.

[49]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[50]  Susan M. Bridges,et al.  Prediction of Cell Penetrating Peptides by Support Vector Machines , 2011, PLoS Comput. Biol..

[51]  Xin Deng,et al.  The MULTICOM toolbox for protein structure prediction , 2012, BMC Bioinformatics.

[52]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Tomas Bergman,et al.  New developments in protein structure–function analysis by MS and use of hydrogen–deuterium exchange microfluidics , 2011, The FEBS Journal.

[54]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[55]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Bingru Yang,et al.  HYBP_PSSP: a hybrid back propagation method for predicting protein secondary structure , 2011, Neural Computing and Applications.

[57]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[58]  Hui Ding,et al.  Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. , 2011, Journal of theoretical biology.

[59]  Samuel F. Berkovic,et al.  A childhood epilepsy mutation reveals a role for developmentally regulated splicing of a sodium channel , 2007, Molecular and Cellular Neuroscience.

[60]  S Rackovsky,et al.  Optimized representations and maximal information in proteins , 2000, Proteins.

[61]  Chun-Chin Hsu,et al.  An information granulation based data mining approach for classifying imbalanced data , 2008, Inf. Sci..

[62]  Xuan Xiao,et al.  NRPred-FS: A Feature Selection based Two-level Predictor for NuclearReceptors , 2014 .

[63]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[64]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[65]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[66]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[67]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[68]  G. Bulaj,et al.  Conus venoms - a rich source of peptide-based therapeutics. , 2008, Current pharmaceutical design.

[69]  S. Khan,et al.  Prediction of protein structure classes using hybrid space of multi-profile Bayes and bi-gram probability feature spaces. , 2014, Journal of theoretical biology.

[70]  Luis M. Botana,et al.  Seafood and freshwater toxins : pharmacology, physiology, and detection , 2000 .