Sequence-dependent prediction of recombination hotspots in Saccharomyces cerevisiae.

Meiotic recombination does not occur randomly across the genome, but instead occurs at relatively high frequencies in some genomic regions (hotspots) and relatively low frequencies in others (coldspots). Hotspots and coldspots would shed light on the mechanism of recombination, but the accurate prediction of hot/cold spots is still an open question. In this study, we presented a model to predict hot/cold spots in yeast using increment of diversity combined with quadratic discriminant analysis (IDQD) based on sequence k-mer frequencies. 5-fold cross validation showed a total prediction accuracy of 80.3%. Compared with other machine-learning algorithms, IDQD approach is as powerful as random forest (RF) and outperforms support vector machine (SVM) in identifying hotspots and coldspots. We also predicted increased recombination rates in the upstream regions of transcription start sites and in the downstream regions of transcription termination sites. Additionally, genome-wide recombination map in yeast obtained by IDQD model is in close agreement with the experimentally generated map, especially for the Peak locations, although some fine-scale differences exist. Our results highlight the sequence dependency of recombination.

[1]  K. Chou,et al.  iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. , 2011, Journal of theoretical biology.

[2]  K. Paigen,et al.  Prdm9 Controls Activation of Mammalian Recombination Hotspots , 2010, Science.

[3]  D. Gordenin,et al.  Factors affecting inverted repeat stimulation of recombination and deletion in Saccharomyces cerevisiae. , 1998, Genetics.

[4]  Kuo-Chen Chou,et al.  GPCR-GIA: a web-server for identifying G-protein coupled receptors and their families with grey incidence analysis. , 2009, Protein engineering, design & selection : PEDS.

[5]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.

[6]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[7]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[8]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[9]  L. Duret,et al.  Recombination drives the evolution of GC-content in the human genome. , 2004, Molecular biology and evolution.

[10]  Peter Donnelly,et al.  The Influence of Recombination on Human Genetic Diversity , 2006, PLoS genetics.

[11]  Kuo-Chen Chou,et al.  A Multi-Label Classifier for Predicting the Subcellular Localization of Gram-Negative Bacterial Proteins with Both Single and Multiple Sites , 2011, PloS one.

[12]  Peter Donnelly,et al.  A common sequence motif associated with recombination hot spots and genome instability in humans , 2008, Nature Genetics.

[13]  K. Chou,et al.  Artificial Neural Network Model for Predicting Membrane Protein Types , 2001, Journal of biomolecular structure & dynamics.

[14]  T. Hassold,et al.  Variation in human meiotic recombination. , 2004, Annual review of genomics and human genetics.

[15]  G M Maggiora,et al.  Domain structural class prediction. , 1998, Protein engineering.

[16]  Kuo-Chen Chou,et al.  Quat-2L: a web-server for predicting protein quaternary structural attributes , 2011, Molecular Diversity.

[17]  F. Nasar,et al.  Long Palindromic Sequences Induce Double-Strand Breaks during Meiosis in Yeast , 2000, Molecular and Cellular Biology.

[18]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[19]  Zuhong Lu,et al.  Capturing Cryptosporidium. , 1996, Nucleic Acids Res..

[20]  P. Brown,et al.  Global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Ying Zhang,et al.  Increment of diversity with quadratic discriminant analysis – an efficient tool for sequence pattern recognition in bioinformatics , 2010 .

[22]  Hong Li,et al.  The Correlation Between Recombination Rate and Dinucleotide Bias in Drosophila melanogaster , 2008, Journal of Molecular Evolution.

[23]  Martin J Lercher,et al.  Human SNP variability and mutation rate are higher in regions of high recombination. , 2002, Trends in genetics : TIG.

[24]  L. Duret,et al.  GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. , 2001, Genetics.

[25]  Liaofu Luo,et al.  Splice site prediction with quadratic discriminant analysis using diversity measure. , 2003, Nucleic acids research.

[26]  Xuan Xiao and Kuo-Chen Chou Using Pseudo Amino Acid Composition to Predict Protein Attributes Via Cellular Automata and Other Approaches , 2011 .

[27]  Michael Ruogu Zhang,et al.  Identification of protein coding regions in the human genome by quadratic discriminant analysis. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[29]  Asifullah Khan,et al.  Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. , 2011, Journal of theoretical biology.

[30]  P. Suganthan,et al.  AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. , 2011, Journal of theoretical biology.

[31]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[32]  Kuo-Chen Chou,et al.  GPCR-2 L : predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions w , 2010 .

[33]  A. Khalil,et al.  Nucleosome occupancy landscape and dynamics at mouse recombination hotspots , 2010, EMBO reports.

[34]  Hassan Mohabatkar,et al.  Prediction of cyclin proteins using Chou's pseudo amino acid composition. , 2010, Protein and peptide letters.

[35]  L. Cai,et al.  Processed pseudogenes are located preferentially in regions of low recombination rates in the human genome , 2010, Journal of evolutionary biology.

[36]  Q Gu,et al.  Prediction of G-protein-coupled receptor classes in low homology using Chou's pseudo amino acid composition with approximate entropy and hydrophobicity patterns. , 2010, Protein and peptide letters.

[37]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[38]  M. Kreitman,et al.  The correlation between intron length and recombination in drosophila. Dynamic equilibrium between mutational and selective forces. , 2000, Genetics.

[39]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[40]  D. Petrov,et al.  Codon Bias and Noncoding GC Content Correlate Negatively with Recombination Rate on the Drosophila X Chromosome , 2005, Journal of Molecular Evolution.

[41]  M. Esmaeili,et al.  Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. , 2010, Journal of theoretical biology.

[42]  R. Laxton The measure of diversity. , 1978, Journal of theoretical biology.

[43]  G. Coop,et al.  PRDM9 Is a Major Determinant of Meiotic Recombination Hotspots in Humans and Mice , 2010, Science.

[44]  Guo-Ping Zhou,et al.  Subcellular location prediction of apoptosis proteins , 2002, Proteins.

[45]  David Haussler,et al.  Comparative recombination rates in the rat, mouse, and human genomes. , 2004, Genome research.

[46]  K. Chou,et al.  Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization , 2010, PloS one.

[47]  Brian Charlesworth,et al.  On the abundance and distribution of transposable elements in the genome of Drosophila melanogaster. , 2002, Molecular biology and evolution.

[48]  Hao Lin,et al.  Predicting conotoxin superfamily and family by using pseudo amino acid composition and modified Mahalanobis discriminant. , 2007, Biochemical and biophysical research communications.

[49]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[50]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[51]  L. Steinmetz,et al.  High-resolution mapping of meiotic crossovers and non-crossovers in yeast , 2008, Nature.

[52]  I. Vaisman,et al.  Knowledge-based computational mutagenesis for predicting the disease potential of human non-synonymous single nucleotide polymorphisms. , 2010, Journal of theoretical biology.

[53]  Q. Z. Li,et al.  The prediction of the structural class of protein: application of the measure of diversity. , 2001, Journal of theoretical biology.

[54]  Kuo-Chen Chou,et al.  Predicting eukaryotic protein subcellular location by fusing optimized evidence-theoretic K-Nearest Neighbor classifiers. , 2006, Journal of proteome research.

[55]  D. Presgraves,et al.  Recombination Enhances Protein Adaptation in Drosophila melanogaster , 2005, Current Biology.

[56]  P. Donnelly,et al.  Drive Against Hotspot Motifs in Primates Implicates the PRDM9 Gene in Meiotic Recombination , 2010, Science.

[57]  A. Burt,et al.  Conservation of recombination hotspots in yeast , 2010, Proceedings of the National Academy of Sciences.

[58]  Kuo-Chen Chou,et al.  GPCR‐CA: A cellular automaton image approach for predicting G‐protein–coupled receptor functional classes , 2009, J. Comput. Chem..

[59]  M. J. Neale,et al.  Initiation of meiotic recombination by formation of DNA double-strand breaks: mechanism and regulation. , 2006, Biochemical Society transactions.

[60]  John A Birdsell,et al.  Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution. , 2002, Molecular biology and evolution.

[61]  K. Chou,et al.  Prediction of Protein Structural Classes by Modified Mahalanobis Discriminant Algorithm , 1998, Journal of protein chemistry.