iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition.

Translation is a key process for gene expression. Timely identification of the translation initiation site (TIS) is very important for conducting in-depth genome analysis. With the avalanche of genome sequences generated in the postgenomic age, it is highly desirable to develop automated methods for rapidly and effectively identifying TIS. Although some computational methods were proposed in this regard, none of them considered the global or long-range sequence-order effects of DNA, and hence their prediction quality was limited. To count this kind of effects, a new predictor, called "iTIS-PseTNC," was developed by incorporating the physicochemical properties into the pseudo trinucleotide composition, quite similar to the PseAAC (pseudo amino acid composition) approach widely used in computational proteomics. It was observed by the rigorous cross-validation test on the benchmark dataset that the overall success rate achieved by the new predictor in identifying TIS locations was over 97%. As a web server, iTIS-PseTNC is freely accessible at http://lin.uestc.edu.cn/server/iTIS-PseTNC. To maximize the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web server to obtain the desired results without the need to go through detailed mathematical equations, which are presented in this paper just for the integrity of the new prection method.

[1]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[2]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[3]  R. Jackson,et al.  The mechanism of eukaryotic translation initiation and principles of its regulation , 2010, Nature Reviews Molecular Cell Biology.

[4]  Ji Wan,et al.  TISdb: a database for alternative translation initiation in mammalian cells , 2013, Nucleic Acids Res..

[5]  Anders Gorm Pedersen,et al.  Neural Network Prediction of Translation Initiation Sites in Eukaryotes: Perspectives for EST and Genome Analysis , 1997, ISMB.

[6]  Wei Chen,et al.  iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition , 2014, Bioinform..

[7]  Maike Tech,et al.  An unsupervised classification scheme for improving predictions of prokaryotic TIS , 2006, BMC Bioinformatics.

[8]  K. Chou,et al.  Neural network prediction of the HIV-1 protease cleavage sites. , 1995, Journal of theoretical biology.

[9]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[10]  Xiaolong Wang,et al.  Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection , 2013, Bioinform..

[11]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[12]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[13]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[14]  K. Chou Using subsite coupling to predict signal peptides. , 2001, Protein engineering.

[15]  Pufeng Du,et al.  PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets , 2014, International journal of molecular sciences.

[16]  Kuo-Chen Chou,et al.  NR-2L: A Two-Level Predictor for Identifying Nuclear Receptor Subfamilies Based on Sequence-Derived Features , 2011, PloS one.

[17]  J. Nieto,et al.  Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou's pseudo amino acid composition. , 2009, Journal of theoretical biology.

[18]  Artemis G. Hatzigeorgiou,et al.  Translation initiation start prediction in human cDNAs with high accuracy , 2002, Bioinform..

[19]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.

[20]  M. Kozak Initiation of translation in prokaryotes and eukaryotes. , 1999, Gene.

[21]  K. Chou,et al.  iGPCR-Drug: A Web Server for Predicting Interaction between GPCRs and Drugs in Cellular Networking , 2013, PloS one.

[22]  K. Chou,et al.  A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins. , 1993, The Journal of biological chemistry.

[23]  Jacques Lapointe,et al.  Theoretical and experimental biology in one—A symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Giegé’s 40th anniversary of their scientific careers , 2013 .

[24]  Shao-Ping Shi,et al.  Identifying protein quaternary structural attributes by incorporating physicochemical properties into the general form of Chou's PseAAC via discrete wavelet transform. , 2012, Molecular bioSystems.

[25]  Hassan Mohabatkar,et al.  Prediction of allergenic proteins by means of the concept of Chou's pseudo amino acid composition and a machine learning approach. , 2012, Medicinal chemistry (Shariqah (United Arab Emirates)).

[26]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[27]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[28]  Ganapati Panda,et al.  A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction , 2010, Comput. Biol. Chem..

[29]  Yvan Saeys,et al.  Translation initiation site prediction on a genomic scale: beauty in simplicity , 2007, ISMB/ECCB.

[30]  A. Esmaeili,et al.  Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine. , 2011, Journal of theoretical biology.

[31]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[32]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[33]  K. Chou Prediction of human immunodeficiency virus protease cleavage sites in proteins. , 1996, Analytical biochemistry.

[34]  Kuo-Chen Chou,et al.  GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions. , 2011, Molecular bioSystems.

[35]  K. Chou,et al.  An optimization approach to predicting protein structural class from amino acid composition , 1992, Protein science : a publication of the Protein Society.

[36]  K. Chou,et al.  iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins , 2013, PeerJ.

[37]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[38]  H. Mohabatkar,et al.  Prediction of metalloproteinase family based on the concept of Chou’s pseudo amino acid composition using a machine learning approach , 2011, Journal of Structural and Functional Genomics.

[39]  Kuo-Chen Chou,et al.  Predicting membrane protein types by the LLDA algorithm. , 2008, Protein and peptide letters.

[40]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[41]  K. Chou,et al.  A correlation-coefficient method to predicting protein-structural classes from amino acid compositions. , 1992, European journal of biochemistry.

[42]  K. Chou,et al.  A sequence‐coupled vector‐projection model for predicting the specificity of GalNAc‐transferase , 1995, Protein science : a publication of the Protein Society.

[43]  K. Chou,et al.  iEzy-Drug: A Web Server for Identifying the Interaction between Enzymes and Drugs in Cellular Networking , 2013, BioMed research international.

[44]  Kuo-Bin Li,et al.  Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition. , 2013, Journal of theoretical biology.

[45]  K. Chou Prediction of signal peptides using scaled window , 2001, Peptides.

[46]  Kuo-Chen Chou,et al.  iNR-Drug: Predicting the Interaction of Drugs with Nuclear Receptors in Cellular Networking , 2014, International journal of molecular sciences.

[47]  Kuo-Chen Chou,et al.  Prediction of G-protein-coupled receptor classes. , 2005, Journal of proteome research.

[48]  K. Chou,et al.  iHyd-PseAAC: Predicting Hydroxyproline and Hydroxylysine in Proteins by Incorporating Dipeptide Position-Specific Propensity into Pseudo Amino Acid Composition , 2014, International journal of molecular sciences.

[49]  K. Chou,et al.  A key driving force in determination of protein structural classes. , 1999, Biochemical and biophysical research communications.

[50]  Meng Wang,et al.  SLLE for predicting membrane protein types. , 2005, Journal of theoretical biology.

[51]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[52]  K. Chou,et al.  iMethyl-PseAAC: Identification of Protein Methylation Sites via a Pseudo Amino Acid Composition Approach , 2014, BioMed research international.

[53]  P. Suganthan,et al.  AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. , 2011, Journal of theoretical biology.

[54]  Loris Nanni,et al.  Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization , 2008, Amino Acids.

[55]  H. Mohabatkar,et al.  Predicting anticancer peptides with Chou's pseudo amino acid composition and investigating their mutagenicity via Ames test. , 2014, Journal of theoretical biology.

[56]  K. Chou,et al.  iRSpot-TNCPseAAC: Identify Recombination Spots with Trinucleotide Composition and Pseudo Amino Acid Components , 2014, International journal of molecular sciences.

[57]  Kuo-Chen Chou,et al.  Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. , 2005, Biochemical and biophysical research communications.

[58]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[59]  Kuo-Chen Chou,et al.  Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[60]  Mandana Behbahani,et al.  Predicting antibacterial peptides by the concept of Chou's pseudo-amino acid composition and machine learning methods. , 2012, Protein and peptide letters.

[61]  Tetsuo Nishikawa,et al.  Assessing protein coding region integrity in cDNA sequencing projects , 1998, Bioinform..

[62]  Xin Wang,et al.  PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[63]  K. Chou,et al.  iSS-PseDNC: Identifying Splicing Sites Using Pseudo Dinucleotide Composition , 2014, BioMed research international.

[64]  Sheng-Xiang Lin,et al.  Theoretical and experimental biology in one — , 2013 .

[65]  K. Chou,et al.  Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms , 2010 .

[66]  Gunnar Rätsch,et al.  Engineering Support Vector Machine Kerneis That Recognize Translation Initialion Sites , 2000, German Conference on Bioinformatics.

[67]  K. Chou,et al.  iCDI-PseFpt: identify the channel-drug interaction in cellular networking with PseAAC and molecular fingerprints. , 2013, Journal of theoretical biology.

[68]  K. Chou,et al.  iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. , 2013, Analytical biochemistry.

[69]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[70]  Yonghong Wang,et al.  Recognition of Translation Initiation Sites of Eukaryotic Genes Based on an EM Algorithm , 2003, J. Comput. Biol..

[71]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[72]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.