Support vector machine with a Pearson VII function kernel for discriminating halophilic and non-halophilic proteins

Understanding of proteins adaptive to hypersaline environment and identifying them is a challenging task and would help to design stable proteins. Here, we have systematically analyzed the normalized amino acid compositions of 2121 halophilic and 2400 non-halophilic proteins. The results showed that halophilic protein contained more Asp at the expense of Lys, Ile, Cys and Met, fewer small and hydrophobic residues, and showed a large excess of acidic over basic amino acids. Then, we introduce a support vector machine method to discriminate the halophilic and non-halophilic proteins, by using a novel Pearson VII universal function based kernel. In the three validation check methods, it achieved an overall accuracy of 97.7%, 91.7% and 86.9% and outperformed other machine learning algorithms. We also address the influence of protein size on prediction accuracy and found the worse performance for small size proteins might be some significant residues (Cys and Lys) were missing in the proteins.

[1]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[2]  Thomas Martinetz,et al.  Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition. , 2010, Protein and peptide letters.

[3]  L. Buydens,et al.  Facilitating the application of Support Vector Regression by using a universal Pearson VII function based kernel , 2006 .

[4]  M Michael Gromiha,et al.  Motifs in outer membrane protein sequences: applications for discrimination. , 2005, Biophysical chemistry.

[5]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.

[6]  K. Chou,et al.  iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. , 2013, Molecular bioSystems.

[7]  F. Frolow,et al.  Halophilic enzymes: proteins with a grain of salt. , 2000, Biophysical chemistry.

[8]  R. Dallüge,et al.  A tetrapeptide fragment‐based design method results in highly stable artificial proteins , 2007, Proteins.

[9]  A. Oren,et al.  The amino acid composition of proteins from anaerobic halophilic bacteria of the order Halanaerobiales , 2012, Extremophiles.

[10]  Jian Zhang,et al.  Prots: A fragment based protein thermo‐stability potential , 2012, Proteins.

[11]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[12]  C. Dutta,et al.  Molecular signature of hypersaline adaptation: insights from genome and proteome composition of halophilic prokaryotes , 2008, Genome Biology.

[13]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[14]  Baishan Fang,et al.  LogitBoost classifier for discriminating thermophilic and mesophilic proteins. , 2007, Journal of biotechnology.

[15]  Yujie Cai,et al.  The influence of dipeptide composition on protein thermostability , 2004, FEBS letters.

[16]  E. Querol,et al.  A simple electrostatic criterion for predicting the thermal stability of proteins. , 2003, Protein engineering.

[17]  Cristóbal N. Aguilar,et al.  Halophilic hydrolases as a new tool for the biotechnological industries. , 2012, Journal of the science of food and agriculture.

[18]  R. Russell,et al.  Amino‐Acid Properties and Consequences of Substitutions , 2003 .

[19]  Samad Jahandideh,et al.  Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection. , 2012, Journal of theoretical biology.

[20]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[21]  E. Delong,et al.  Genomic analysis of the uncultivated marine crenarchaeote Cenarchaeum symbiosum , 2006, Proceedings of the National Academy of Sciences.

[22]  Thomas Martinetz,et al.  BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection , 2011, BMC Bioinformatics.

[23]  Vladimir B. Bajic,et al.  Genome Sequence of Halorhabdus tiamatea, the First Archaeon Isolated from a Deep-Sea Anoxic Brine Lake , 2011, Journal of bacteriology.

[24]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[25]  S. Hamodrakas,et al.  Haloadaptation: insights from comparative modeling studies of halophilic archaeal DHFRs. , 2007, International journal of biological macromolecules.

[26]  D. Madern,et al.  Mutation at a single acidic amino acid enhances the halophilic behaviour of malate dehydrogenase from Haloarcula marismortui in physiological salts. , 1995, European journal of biochemistry.

[27]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.

[28]  Bernard F. Buxton,et al.  Secondary structure prediction with support vector machines , 2003, Bioinform..

[29]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[30]  M. Ebrahimi,et al.  Protein attributes contribute to halo-stability, bioinformatics approach , 2011, Saline systems.

[31]  P. Y. Chou,et al.  Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. , 1974, Biochemistry.

[32]  Liang-Tsung Huang,et al.  Reliable prediction of protein thermostability change upon double mutation from amino acid sequence , 2009, Bioinform..

[33]  J A McCammon,et al.  Electrostatic contributions to the stability of halophilic proteins. , 1998, Journal of molecular biology.

[34]  L. Hood,et al.  Understanding the adaptation of Halobacterium species NRC-1 to its extreme environment through computational analysis of its genome sequence. , 2001, Genome research.

[35]  W. Doolittle,et al.  The genome of Salinibacter ruber: convergence and gene exchange among hyperhalophilic bacteria and archaea. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[36]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[37]  Hiroko Tokunaga,et al.  Engineering of halophilic enzymes: Two acidic amino acid residues at the carboxy‐terminal region confer halophilic characteristics to Halomonas and Pseudomonas nucleoside diphosphate kinases , 2008, Protein science : a publication of the Protein Society.

[38]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[39]  R. Bannen,et al.  Bioinformatic method for protein thermal stabilization by structural entropy optimization , 2008, Proceedings of the National Academy of Sciences.

[40]  Satoshi Fukuchi,et al.  Unique amino acid composition of proteins in halophilic bacteria. , 2003, Journal of molecular biology.

[41]  Hassan Mohabatkar,et al.  Prediction of cyclin proteins using Chou's pseudo amino acid composition. , 2010, Protein and peptide letters.

[42]  Ganapati Panda,et al.  A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction , 2010, Comput. Biol. Chem..

[43]  Kuo-Bin Li,et al.  Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition. , 2013, Journal of theoretical biology.

[44]  R. Kaul,et al.  Complete Genome Sequence of the Genetically Tractable Hydrogenotrophic Methanogen Methanococcus maripaludis , 2004, Journal of bacteriology.

[45]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[46]  D. Madern,et al.  Halophilic adaptation of enzymes , 2000, Extremophiles.