iRSpot-TNCPseAAC: Identify Recombination Spots with Trinucleotide Composition and Pseudo Amino Acid Components

Meiosis and recombination are the two opposite aspects that coexist in a DNA system. As a driving force for evolution by generating natural genetic variations, meiotic recombination plays a very important role in the formation of eggs and sperm. Interestingly, the recombination does not occur randomly across a genome, but with higher probability in some genomic regions called “hotspots”, while with lower probability in so-called “coldspots”. With the ever-increasing amount of genome sequence data in the postgenomic era, computational methods for effectively identifying the hotspots and coldspots have become urgent as they can timely provide us with useful insights into the mechanism of meiotic recombination and the process of genome evolution as well. To meet the need, we developed a new predictor called “iRSpot-TNCPseAAC”, in which a DNA sample was formulated by combining its trinucleotide composition (TNC) and the pseudo amino acid components (PseAAC) of the protein translated from the DNA sample according to its genetic codes. The former was used to incorporate its local or short-rage sequence order information; while the latter, its global and long-range one. Compared with the best existing predictor in this area, iRSpot-TNCPseAAC achieved higher rates in accuracy, Mathew’s correlation coefficient, and sensitivity, indicating that the new predictor may become a useful tool for identifying the recombination hotspots and coldspots, or, at least, become a complementary tool to the existing methods. It has not escaped our notice that the aforementioned novel approach to incorporate the DNA sequence order information into a discrete model may also be used for many other genome analysis problems. The web-server for iRSpot-TNCPseAAC is available at http://www.jci-bioinfo.cn/iRSpot-TNCPseAAC. Furthermore, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the current web server to obtain their desired result without the need to follow the complicated mathematical equations.

[1]  C. Tanford Contribution of Hydrophobic Interactions to the Stability of the Globular Conformation of Proteins , 1962 .

[2]  R. C. Weast CRC Handbook of Chemistry and Physics , 1973 .

[3]  K. R. Woods,et al.  Prediction of protein antigenic determinants from amino acid sequences. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[4]  W. H. Elliott,et al.  Data for Biochemical Research , 1986 .

[5]  M. Higgins Data for Biochemical Research, 3rd Edition , 1987 .

[6]  W. M. Haynes CRC Handbook of Chemistry and Physics , 1990 .

[7]  A. Sherman,et al.  Multiple sites for double‐strand breaks in whole meiotic chromosomes of Saccharomyces cerevisiae. , 1992, The EMBO journal.

[8]  A. Goldman,et al.  Meiotic recombination hotspots. , 1995, Annual review of genetics.

[9]  K. Chou,et al.  Neural network prediction of the HIV-1 protease cleavage sites. , 1995, Journal of theoretical biology.

[10]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[11]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[12]  A. Nicolas,et al.  Clustering of meiotic double-strand breaks on yeast chromosome III. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[13]  W. P. Wahls Meiotic recombination hotspots: shaping the genome and insights into hypervariable minisatellite DNA change. , 1998, Current topics in developmental biology.

[14]  K. Chou,et al.  Prediction of protein secondary structure content. , 1999, Protein engineering.

[15]  Kuo-Chen Chou,et al.  Notes & TipsArtificial Neural Network Model for Predicting α-Turn Types☆ , 1999 .

[16]  K. Chou,et al.  Using Pair-Coupled Amino Acid Composition to Predict Protein Secondary Structure Content , 1999, Journal of protein chemistry.

[17]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[18]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[19]  T. Petes,et al.  Meiotic recombination hot spots and cold spots , 2001, Nature Reviews Genetics.

[20]  K. Chou,et al.  Prediction of protein signal sequences and their cleavage sites , 2001, Proteins.

[21]  K. Chou Using subsite coupling to predict signal peptides. , 2001, Protein engineering.

[22]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[23]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[24]  Kuo-Chen Chou,et al.  Prediction of enzyme family classes. , 2003, Journal of proteome research.

[25]  A. Jeffreys,et al.  Meiotic recombination hot spots and human DNA diversity. , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[26]  Kuo-Chen Chou,et al.  Predicting subcellular localization of proteins in a hybridization space , 2004, Bioinform..

[27]  Kuo-Chen Chou,et al.  Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. , 2005, Biochemical and biophysical research communications.

[28]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[29]  Meng Wang,et al.  SLLE for predicting membrane protein types. , 2005, Journal of theoretical biology.

[30]  Kuo-Chen Chou,et al.  Prediction of G-protein-coupled receptor classes. , 2005, Journal of proteome research.

[31]  Kuo-Chen Chou,et al.  Boosting classifier for predicting protein domain structural class. , 2005, Biochemical and biophysical research communications.

[32]  K. Chou,et al.  Prediction of linear B-cell epitopes using amino acid pair antigenicity scale , 2007, Amino Acids.

[33]  Kuo-Chen Chou,et al.  Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[34]  Kuo-Chen Chou,et al.  Large-scale predictions of gram-negative bacterial protein subcellular locations. , 2006, Journal of proteome research.

[35]  K. Chou,et al.  Hum-PLoc: a novel ensemble classifier for predicting human protein subcellular localization. , 2006, Biochemical and biophysical research communications.

[36]  Kuo-Chen Chou,et al.  Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition. , 2006, Journal of theoretical biology.

[37]  D. Kaback,et al.  Patterns of meiotic double-strand breakage on native and artificial yeast chromosomes , 1996, Chromosoma.

[38]  Kuo-Chen Chou,et al.  Prediction of protease types in a hybridization space. , 2006, Biochemical and biophysical research communications.

[39]  Kuo-Chen Chou,et al.  Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. , 2006, Journal of proteome research.

[40]  Loris Nanni,et al.  Genetic programming for creating Chou’s pseudo amino acid based features for submitochondria localization , 2008, Amino Acids.

[41]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[42]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.

[43]  Kuo-Chen Chou,et al.  Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[44]  Qianzhong Li,et al.  Using pseudo amino acid composition to predict protein structural class: Approached by incorporating 400 dipeptide components , 2007, J. Comput. Chem..

[45]  Shao-Wu Zhang,et al.  Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies , 2008, Amino Acids.

[46]  K. Ferguson,et al.  Abnormal meiotic recombination in infertile men and its association with sperm aneuploidy. , 2007, Human molecular genetics.

[47]  K. Chou,et al.  Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image. , 2008, Journal of theoretical biology.

[48]  Guangya Zhang,et al.  Predicting the cofactors of oxidoreductases based on amino acid composition distribution and Chou's amphiphilic pseudo-amino acid composition. , 2008, Journal of theoretical biology.

[49]  S. Keeney Spo11 and the Formation of DNA Double-Strand Breaks in Meiosis. , 2008, Genome dynamics and stability.

[50]  Shao-Wu Zhang,et al.  Using Chou’s pseudo amino acid composition to predict protein quaternary structure: a sequence-segmented PseAAC approach , 2008, Amino Acids.

[51]  K. Chou,et al.  PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. , 2008, Analytical biochemistry.

[52]  D. Carrell,et al.  Analysis of the Meiotic Recombination Gene REC8 for Sequence Variations in a Population with Severe Male Factor Infertility , 2008, Systems biology in reproductive medicine.

[53]  Kuo-Chen Chou,et al.  A top-down approach to enhance the power of predicting human protein subcellular localization: Hum-mPLoc 2.0. , 2009, Analytical biochemistry.

[54]  Yanzhi Guo,et al.  Using the augmented Chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. , 2009, Journal of theoretical biology.

[55]  J. Nieto,et al.  Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou's pseudo amino acid composition. , 2009, Journal of theoretical biology.

[56]  Hao Lin,et al.  Prediction of Subcellular Localization of Apoptosis Protein Using Chou’s Pseudo Amino Acid Composition , 2009, Acta biotheoretica.

[57]  Jianding Qiu,et al.  Prediction of G-protein-coupled receptor classes based on the concept of Chou's pseudo amino acid composition: an approach from discrete wavelet transform. , 2009, Analytical biochemistry.

[58]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[59]  M. Esmaeili,et al.  Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. , 2010, Journal of theoretical biology.

[60]  Thomas Martinetz,et al.  Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition. , 2010, Protein and peptide letters.

[61]  Menglong Li,et al.  SecretP: identifying bacterial secreted proteins by fusing new features into Chou's pseudo-amino acid composition. , 2010, Journal of theoretical biology.

[62]  Ganapati Panda,et al.  A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction , 2010, Comput. Biol. Chem..

[63]  Hassan Mohabatkar,et al.  Prediction of cyclin proteins using Chou's pseudo amino acid composition. , 2010, Protein and peptide letters.

[64]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[65]  K. Chou,et al.  iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model , 2011, PloS one.

[66]  Hui Ding,et al.  Predicting ion channels and their types by the dipeptide mode of pseudo amino acid composition. , 2011, Journal of theoretical biology.

[67]  H. Tempest,et al.  Meiotic recombination and male infertility: from basic science to clinical reality? , 2011, Asian journal of andrology.

[68]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[69]  Qian-zhong Li,et al.  Predicting protein submitochondria locations by combining different descriptors into the general form of Chou’s pseudo amino acid composition , 2011, Amino Acids.

[70]  Kuo-Chen Chou,et al.  A Multi-Label Classifier for Predicting the Subcellular Localization of Gram-Negative Bacterial Proteins with Both Single and Multiple Sites , 2011, PloS one.

[71]  H. Mohabatkar,et al.  Prediction of metalloproteinase family based on the concept of Chou’s pseudo amino acid composition using a machine learning approach , 2011, Journal of Structural and Functional Genomics.

[72]  Kuo-Chen Chou,et al.  GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions. , 2011, Molecular bioSystems.

[73]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[74]  Nak-Kyeong Kim,et al.  Analysis of Biological Features Associated with Meiotic Recombination Hot and Cold Spots in Saccharomyces cerevisiae , 2011, PloS one.

[75]  K. Chou,et al.  iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. , 2011, Journal of theoretical biology.

[76]  Dongsheng Zou,et al.  Supersecondary structure prediction using Chou's pseudo amino acid composition , 2011, J. Comput. Chem..

[77]  A. Esmaeili,et al.  Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine. , 2011, Journal of theoretical biology.

[78]  K. Chou,et al.  iLoc-Plant: a multi-label classifier for predicting the subcellular localization of plant proteins with both single and multiple sites. , 2011, Molecular bioSystems.

[79]  P. Suganthan,et al.  AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. , 2011, Journal of theoretical biology.

[80]  Kuo-Chen Chou,et al.  iNR-PhysChem: A Sequence-Based Predictor for Identifying Nuclear Receptors and Their Subfamilies via Physical-Chemical Property Matrix , 2012, PloS one.

[81]  K. Chou,et al.  iLoc-Hum: using the accumulation-label scale to predict subcellular locations of human proteins with both single and multiple sites. , 2012, Molecular bioSystems.

[82]  Suyu Mei,et al.  Multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization. , 2012, Journal of theoretical biology.

[83]  Suyu Mei,et al.  Predicting plant protein subcellular multi-localization by Chou's PseAAC formulation based multi-label homolog knowledge transfer learning. , 2012, Journal of theoretical biology.

[84]  K. Chou,et al.  Identification of Colorectal Cancer Related Genes with mRMR and Shortest Path in Protein-Protein Interaction Network , 2012, PloS one.

[85]  Wei Chen,et al.  iNuc-PhysChem: A Sequence-Based Predictor for Identifying Nucleosomes via Physicochemical Properties , 2012, PloS one.

[86]  Zia-ur-Rehman,et al.  Identifying GPCRs and their types with Chou's pseudo amino acid composition: an approach from multi-scale energy representation and position specific scoring matrix. , 2012, Protein and peptide letters.

[87]  Dinesh Gupta,et al.  Identifying Bacterial Virulent Proteins by Fusing a Set of Classifiers Based on Variants of Chou's Pseudo Amino Acid Composition and on Evolutionary Information , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[88]  Shao-Ping Shi,et al.  Identifying protein quaternary structural attributes by incorporating physicochemical properties into the general form of Chou's PseAAC via discrete wavelet transform. , 2012, Molecular bioSystems.

[89]  Qian-zhong Li,et al.  Predict mycobacterial proteins subcellular locations by incorporating pseudo-average chemical shift into the general form of Chou's pseudo amino acid composition. , 2012, Journal of theoretical biology.

[90]  Kuo-Chen Chou,et al.  Hepatitis C Virus Network Based Classification of Hepatocellular Cirrhosis and Carcinoma , 2012, PloS one.

[91]  Maqsood Hayat,et al.  Discriminating outer membrane proteins with Fuzzy K-nearest Neighbor algorithms based on the general form of Chou's PseAAC. , 2012, Protein and peptide letters.

[92]  K. Chou,et al.  iLoc-Gpos: a multi-layer classifier for predicting the subcellular localization of singleplex and multiplex Gram-positive bacterial proteins. , 2012, Protein and peptide letters.

[93]  Xin Wang,et al.  PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[94]  K. Chou,et al.  Predicting Secretory Proteins of Malaria Parasite by Incorporating Sequence Evolution Information into Pseudo Amino Acid Composition via Grey System Model , 2012, PloS one.

[95]  Jia Liu,et al.  Sequence-dependent prediction of recombination hotspots in Saccharomyces cerevisiae. , 2012, Journal of theoretical biology.

[96]  K. Chou,et al.  Predicting Anatomical Therapeutic Chemical (ATC) Classification of Drugs by Integrating Chemical-Chemical Interactions and Similarities , 2012, PloS one.

[97]  Hassan Mohabatkar,et al.  Prediction of allergenic proteins by means of the concept of Chou's pseudo amino acid composition and a machine learning approach. , 2012, Medicinal chemistry (Shariqah (United Arab Emirates)).

[98]  Dimitris N. Georgiou,et al.  A Short Survey on Genetic Sequences, Chou’s Pseudo Amino Acid Composition and its Combination with Fuzzy Set Theory , 2013 .

[99]  N. Rao,et al.  PREDICTING SUBCHLOROPLAST LOCATIONS OF PROTEINS BASED ON THE GENERAL FORM OF CHOU'S PSEUDO AMINO ACID COMPOSITION: APPROACHED FROM OPTIMAL TRIPEPTIDE COMPOSITION , 2013 .

[100]  Jorng-Tzong Horng,et al.  EuLoc: a web-server for accurately predict protein subcellular localization in eukaryotes by incorporating various features of sequence segments into the general form of Chou’s PseAAC , 2013, Journal of Computer-Aided Molecular Design.

[101]  Mandana Behbahani,et al.  Predicting antibacterial peptides by the concept of Chou's pseudo-amino acid composition and machine learning methods. , 2012, Protein and peptide letters.

[102]  Xiaolong Wang,et al.  Protein Remote Homology Detection by Combining Chou’s Pseudo Amino Acid Composition and Profile‐Based Protein Representation , 2013, Molecular informatics.

[103]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[104]  K. Chou,et al.  iLoc-Animal: a multi-label learning classifier for predicting subcellular localization of animal proteins. , 2013, Molecular bioSystems.

[105]  S. Kung,et al.  GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou's pseudo-amino acid composition. , 2013, Journal of theoretical biology.

[106]  Chao Huang,et al.  Using radial basis function on the general form of Chou's pseudo amino acid composition and PSSM to predict subcellular locations of proteins with both single and multiple sites , 2013, Biosyst..

[107]  Jingqi Yuan,et al.  A Multilabel Model Based on Chou’s Pseudo–Amino Acid Composition for Identifying Membrane Proteins with Both Single and Multiple Functional Types , 2013, The Journal of Membrane Biology.

[108]  K. Chou,et al.  iEzy-Drug: A Web Server for Identifying the Interaction between Enzymes and Drugs in Cellular Networking , 2013, BioMed research international.

[109]  J. Sekelsky,et al.  Meiotic and Mitotic Recombination in Meiosis , 2013, Genetics.

[110]  K. Chou,et al.  iCDI-PseFpt: identify the channel-drug interaction in cellular networking with PseAAC and molecular fingerprints. , 2013, Journal of theoretical biology.

[111]  Jacques Lapointe,et al.  Theoretical and experimental biology in one—A symposium in honour of Professor Kuo-Chen Chou’s 50th anniversary and Professor Richard Giegé’s 40th anniversary of their scientific careers , 2013 .

[112]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[113]  Liang Fu,et al.  Using ensemble SVM to identify human GPCRs N-linked glycosylation sites based on the general form of Chou's PseAAC. , 2013, Protein engineering, design & selection : PEDS.

[114]  K. Chou,et al.  iSNO-PseAAC: Predict Cysteine S-Nitrosylation Sites in Proteins by Incorporating Position Specific Amino Acid Propensity into Pseudo Amino Acid Composition , 2013, PloS one.

[115]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[116]  K. Chou,et al.  iGPCR-Drug: A Web Server for Predicting Interaction between GPCRs and Drugs in Cellular Networking , 2013, PloS one.

[117]  K. Chou,et al.  iSNO-AAPair: incorporating amino acid pairwise coupling into PseAAC for predicting cysteine S-nitrosylation sites in proteins , 2013, PeerJ.

[118]  Kuo-Chen Chou,et al.  Signal Propagation in Protein Interaction Network during Colorectal Cancer Progression , 2013, BioMed research international.

[119]  K. Chou,et al.  iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. , 2013, Analytical biochemistry.

[120]  K. Chou,et al.  iAMP-2L: a two-level multi-label classifier for identifying antimicrobial peptides and their functional types. , 2013, Analytical biochemistry.

[121]  Jing-Qi Yuan,et al.  Predicting protein subchloroplast locations with both single and multiple sites via three different modes of Chou's pseudo amino acid compositions. , 2013, Journal of theoretical biology.

[122]  Qian-Zhong Li,et al.  Discriminating bioluminescent proteins by incorporating average chemical shift and evolutionary information into the general form of Chou's pseudo amino acid composition. , 2013, Journal of theoretical biology.

[123]  Kuo-Bin Li,et al.  Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition. , 2013, Journal of theoretical biology.

[124]  H. Mohabatkar,et al.  Predicting anticancer peptides with Chou's pseudo amino acid composition and investigating their mutagenicity via Ames test. , 2014, Journal of theoretical biology.

[125]  Xiaolong Wang,et al.  Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection , 2013, Bioinform..