Chou's pseudo amino acid composition improves sequence-based antifreeze protein prediction.

Antifreeze proteins (AFP) in living organisms play a key role in their tolerance to extremely cold temperatures and have a wide range of biotechnological applications. But on account of diversity, their identification has been challenging to biologists. Earlier work explored in this area has yet to cover introduction of sequence order information which is known to represent important properties of various proteins and protein systems for prediction purposes. In this study, the effect of Chou's pseudo amino acid composition that presents sequence order of proteins was systematically explored using support vector machines for AFP prediction. Our findings suggest that introduction of sequence order information helps identify AFPs with an accuracy of 84.75% on independent test dataset, outperforming approaches such as AFP-Pred and iAFP. The relative performance calculated using Youden's Index (Sensitivity+Specificity-1) was found to be 0.71 for our predictor (AFP-PseAAC), 0.48 for AFP-Pred and 0.05 for iAFP. We hope this novel prediction approach will aid in AFP based research for biotechnological applications.

[1]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[2]  Xin Wang,et al.  PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[3]  Yanda Li,et al.  Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence , 2006, BMC Bioinformatics.

[4]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[5]  K. Chou,et al.  iEzy-Drug: A Web Server for Identifying the Interaction between Enzymes and Drugs in Cellular Networking , 2013, BioMed research international.

[6]  Hao Lin,et al.  Prediction of ketoacyl synthase family using reduced amino acid alphabets , 2012, Journal of Industrial Microbiology & Biotechnology.

[7]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[8]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[9]  P. Suganthan,et al.  AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. , 2011, Journal of theoretical biology.

[10]  Wei Chen,et al.  Using Over-Represented Tetrapeptides to Predict Protein Submitochondria Locations , 2013, Acta Biotheoretica.

[11]  Jacques Lapointe,et al.  Theoretical and experimental biology in one—A symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Giegé’s 40th anniversary of their scientific careers , 2013 .

[12]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.

[13]  T. Sformo,et al.  Simultaneous freeze tolerance and avoidance in individual fungus gnats, Exechia nugatoria , 2009, Journal of Comparative Physiology B.

[14]  Cullen Schaffer,et al.  Technical Note: Selecting a Classification Method by Cross-Validation , 1993, Machine Learning.

[15]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[16]  C. Hew,et al.  Biochemistry of fish antifreeze proteins , 1990, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[17]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[18]  Kaustubh D. Dhole,et al.  Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier. , 2014, Journal of theoretical biology.

[19]  K. Chou,et al.  PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. , 2008, Analytical biochemistry.

[20]  Ganesan Pugalenthi,et al.  Predicting protein structural class by SVM with class-wise optimized features and decision probabilities. , 2008, Journal of theoretical biology.

[21]  Wei Chen,et al.  iNuc-PhysChem: A Sequence-Based Predictor for Identifying Nucleosomes via Physicochemical Properties , 2012, PloS one.

[22]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[23]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[24]  Pufeng Du,et al.  PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets , 2014, International journal of molecular sciences.

[25]  Sukanta Mondal,et al.  Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. , 2006, Journal of theoretical biology.

[26]  Kuo-Chen Chou,et al.  iNR-Drug: Predicting the Interaction of Drugs with Nuclear Receptors in Cellular Networking , 2014, International journal of molecular sciences.

[27]  Cullen Schaffer,et al.  Selecting a classification method by cross-validation , 1993, Machine Learning.

[28]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[29]  K. Chou,et al.  iCDI-PseFpt: identify the channel-drug interaction in cellular networking with PseAAC and molecular fingerprints. , 2013, Journal of theoretical biology.

[30]  Xiaolong Wang,et al.  Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection , 2013, Bioinform..

[31]  Wei Chen,et al.  iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition , 2014, Bioinform..

[32]  Hui Ding,et al.  AcalPred: A Sequence-Based Tool for Discriminating between Acidic and Alkaline Enzymes , 2013, PloS one.

[33]  Xiaowei Zhao,et al.  Using Support Vector Machine and Evolutionary Profiles to Predict Antifreeze Protein Sequences , 2012, International journal of molecular sciences.

[34]  K. Chou,et al.  iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. , 2013, Analytical biochemistry.

[35]  Wei Chen,et al.  Identification of Antioxidants from Sequence Information Using Naïve Bayes , 2013, Comput. Math. Methods Medicine.

[36]  K. Chou,et al.  iRSpot-TNCPseAAC: Identify Recombination Spots with Trinucleotide Composition and Pseudo Amino Acid Components , 2014, International journal of molecular sciences.

[37]  Kuo-Chen Chou,et al.  Prediction of Membrane Protein Types by Incorporating Amphipathic Effects , 2005, J. Chem. Inf. Model..

[38]  Wei Chen,et al.  Naïve Bayes Classifier with Feature Selection to Identify Phage Virion Proteins , 2013, Comput. Math. Methods Medicine.

[39]  K. Chou,et al.  Physics and chemistry-driven artificial neural network for predicting bioactivity of peptides and proteins and their design. , 2009, Journal of theoretical biology.

[40]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[41]  K. Chou,et al.  Application of SVM to predict membrane protein types. , 2004, Journal of theoretical biology.

[42]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[43]  Chin-Sheng Yu,et al.  Identification of Antifreeze Proteins and Their Functional Residues by Support Vector Machine and Genetic Algorithms based on n-Peptide Compositions , 2011, PloS one.

[44]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[45]  M. Esmaeili,et al.  Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. , 2010, Journal of theoretical biology.

[46]  K. Chou,et al.  Energy-optimized structure of antifreeze protein and its binding mechanism. , 1992, Journal of molecular biology.

[47]  Hao Lin,et al.  Prediction of Subcellular Localization of Apoptosis Protein Using Chou’s Pseudo Amino Acid Composition , 2009, Acta biotheoretica.

[48]  Ganapati Panda,et al.  A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction , 2010, Comput. Biol. Chem..