AcalPred: A Sequence-Based Tool for Discriminating between Acidic and Alkaline Enzymes

The structure and activity of enzymes are influenced by pH value of their surroundings. Although many enzymes work well in the pH range from 6 to 8, some specific enzymes have good efficiencies only in acidic (pH<5) or alkaline (pH>9) solution. Studies have demonstrated that the activities of enzymes correlate with their primary sequences. It is crucial to judge enzyme adaptation to acidic or alkaline environment from its amino acid sequence in molecular mechanism clarification and the design of high efficient enzymes. In this study, we developed a sequence-based method to discriminate acidic enzymes from alkaline enzymes. The analysis of variance was used to choose the optimized discriminating features derived from g-gap dipeptide compositions. And support vector machine was utilized to establish the prediction model. In the rigorous jackknife cross-validation, the overall accuracy of 96.7% was achieved. The method can correctly predict 96.3% acidic and 97.1% alkaline enzymes. Through the comparison between the proposed method and previous methods, it is demonstrated that the proposed method is more accurate. On the basis of this proposed method, we have built an online web-server called AcalPred which can be freely accessed from the website (http://lin.uestc.edu.cn/server/AcalPred). We believe that the AcalPred will become a powerful tool to study enzyme adaptation to acidic or alkaline environment.

[1]  Susan Idicula-Thomas,et al.  Understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in Escherichia coli , 2005, Protein science : a publication of the Protein Society.

[2]  Guangya Zhang A simple statistical method for discrimination of thermophilic and mesophilic proteins based on amino acid composition , 2013, Int. J. Bioinform. Res. Appl..

[3]  Hong Gu,et al.  A novel method for predicting protein subcellular localization based on pseudo amino acid composition. , 2010, BMB reports.

[4]  Emanuele Tomba,et al.  Prediction of protein solubility in Escherichia coli using logistic regression , 2010, Biotechnology and bioengineering.

[5]  Nikhil U. Nair,et al.  Engineering of Enzymes for Selective Catalysis , 2010 .

[6]  Hui Ding,et al.  Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. , 2011, Journal of theoretical biology.

[7]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[8]  K. Horikoshi,et al.  Analysis of the genome of an alkaliphilic Bacillus strain from an industrial point of view , 2000, Extremophiles.

[9]  Yongchun Zuo,et al.  Predicting acidic and alkaline enzymes by incorporating the average chemical shift and gene ontology informations into the general form of Chou's PseAAC , 2013 .

[10]  Iosif I Vaisman,et al.  Discrimination of thermophilic and mesophilic proteins , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop.

[11]  Wei Chen,et al.  Prediction of thermophilic proteins using feature selection technique. , 2011, Journal of microbiological methods.

[12]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[13]  M Michael Gromiha,et al.  Discrimination of mesophilic and thermophilic proteins using machine learning algorithms , 2007, Proteins.

[14]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[15]  Bhaskar D. Kulkarni,et al.  A support vector machine-based method for predicting the propensity of a protein to be soluble or to form inclusion body on overexpression in Escherichia coli , 2006, Bioinform..

[16]  Pierre Baldi,et al.  SOLpro: accurate sequence-based prediction of protein solubility , 2009, Bioinform..

[17]  M. Michael Gromiha,et al.  A simple statistical method for discriminating outer membrane proteins with better accuracy , 2005, Bioinform..

[18]  C. Vieille,et al.  Hyperthermophilic Enzymes: Sources, Uses, and Molecular Mechanisms for Thermostability , 2001, Microbiology and Molecular Biology Reviews.

[19]  Jian Huang,et al.  Prediction of Golgi-resident protein types by using feature selection technique , 2013 .

[20]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[21]  Hao Lin,et al.  Predicting subcellular localization of mycobacterial proteins by using Chou's pseudo amino acid composition. , 2008, Protein and peptide letters.

[22]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  Jingbo Xia,et al.  Prediction of thermophilic protein with pseudo amino Acid composition: an approach from combined feature selection and reduction. , 2011, Protein and peptide letters.

[25]  Hong-Bin Shen,et al.  Conotoxin superfamily prediction using diffusion maps dimensionality reduction and subspace classifier. , 2011, Current protein & peptide science.

[26]  A. Papageorgiou,et al.  Enzyme adaptation to alkaline pH: Atomic resolution (1.08 Å) structure of phosphoserine aminotransferase from Bacillus alcalophilus , 2005, Protein science : a publication of the Protein Society.

[27]  Dmitrij Frishman,et al.  Protein solubility: sequence based prediction and experimental verification , 2007, Bioinform..

[28]  Yixue Li,et al.  Prediction of membrane protein types in a hybrid space. , 2008, Journal of proteome research.

[29]  Wei Chen,et al.  Identification of mycobacterial membrane proteins and their types using over-represented tripeptide compositions. , 2012, Journal of proteomics.

[30]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[31]  Guoli Wang,et al.  PISCES: recent improvements to a PDB sequence culling server , 2005, Nucleic Acids Res..

[32]  E. Settembre,et al.  Acidophilic adaptations in the structure of Acetobacter aceti N5-carboxyaminoimidazole ribonucleotide mutase (PurE). , 2004, Acta crystallographica. Section D, Biological crystallography.

[33]  Guangya Zhang,et al.  Discriminating acidic and alkaline enzymes using a random forest model with secondary structure amino acid composition , 2009 .

[34]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[35]  Antje Chang,et al.  BRENDA, AMENDA and FRENDA the enzyme information system: new content and tools in 2009 , 2008, Nucleic Acids Res..

[36]  Gajendra P S Raghava,et al.  SVM based prediction of RNA‐binding proteins using binding residues and evolutionary information , 2011, Journal of molecular recognition : JMR.

[37]  Baishan Fang,et al.  Support vector machine for discrimination of thermophilic and mesophilic proteins based on amino acid composition. , 2006, Protein and peptide letters.

[38]  Songyot Nakariyakul,et al.  Detecting thermophilic proteins through selecting amino acid and dipeptide composition features , 2011, Amino Acids.