ARG-walker: inference of individual specific strengths of meiotic recombination hotspots by population genomics analysis

BackgroundMeiotic recombination hotspots play important roles in various aspects of genomics, but the underlying mechanisms for regulating the locations and strengths of recombination hotspots are not yet fully revealed. Most existing algorithms for estimating recombination rates from sequence polymorphism data can only output average recombination rates of a population, although there is evidence for the heterogeneity in recombination rates among individuals. For genome-wide association studies (GWAS) of recombination hotspots, an efficient algorithm that estimates the individualized strengths of recombination hotspots is highly desirable.ResultsIn this work, we propose a novel graph mining algorithm named ARG-walker, based on random walks on ancestral recombination graphs (ARG), to estimate individual-specific recombination hotspot strengths. Extensive simulations demonstrate that ARG-walker is able to distinguish the hot allele of a recombination hotspot from the cold allele. Integrated with output of ARG-walker, we performed GWAS on the phased haplotype data of the 22 autosome chromosomes of the HapMap Asian population samples of Chinese and Japanese (JPT+CHB). Significant cis-regulatory signals have been detected, which is corroborated by the enrichment of the well-known 13-mer motif CCNCCNTNNCCNC of PRDM9 protein. Moreover, two new DNA motifs have been identified in the flanking regions of the significantly associated SNPs (single nucleotide polymorphisms), which are likely to be new cis-regulatory elements of meiotic recombination hotspots of the human genome.ConclusionsOur results on both simulated and real data suggest that ARG-walker is a promising new method for estimating the individual recombination variations. In the future, it could be used to uncover the mechanisms of recombination regulation and human diseases related with recombination hotspots.

[1]  A. Auton,et al.  Recombination rate estimation in the presence of hotspots. , 2007, Genome research.

[2]  P. Donnelly,et al.  The Fine-Scale Structure of Recombination Rate Variation in the Human Genome , 2004, Science.

[3]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[4]  Timothy L. Bailey,et al.  Gene expression Advance Access publication May 4, 2011 DREME: motif discovery in transcription factor ChIP-seq data , 2011 .

[5]  H. Thiesen,et al.  Multiple genes encoding zinc finger domains are expressed in human T cells. , 1990, The New biologist.

[6]  Guimei Liu,et al.  FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium , 2010, BMC Bioinformatics.

[7]  J. Zhang,et al.  miR-31 regulates interleukin 2 and kinase suppressor of ras 2 during T cell activation , 2013, Genes and Immunity.

[8]  A. Chokkalingam,et al.  Differences in Meiotic Recombination Rates in Childhood Acute Lymphoblastic Leukemia at an MHC Class II Hotspot Close to Disease Associated Haplotypes , 2014, PloS one.

[9]  D. Gudbjartsson,et al.  A high-resolution recombination map of the human genome , 2002, Nature Genetics.

[10]  B. de Massy,et al.  Cis- and Trans-Acting Elements Regulate the Mouse Psmb9 Meiotic Recombination Hotspot , 2007, PLoS genetics.

[11]  Matthew D. Rasmussen,et al.  Genome-Wide Inference of Ancestral Recombination Graphs , 2013, PLoS genetics.

[12]  Jie Zheng,et al.  Detecting sequence polymorphisms associated with meiotic recombination hotspots in the human genome , 2010, Genome Biology.

[13]  P. Donnelly,et al.  A Fine-Scale Map of Recombination Rates and Hotspots Across the Human Genome , 2005, Science.

[14]  J C Murray,et al.  Pediatrics and , 1998 .

[15]  M. Jia,et al.  Kinase suppressor of Ras 2 is involved in regulation of cell proliferation and is up-regulated in human invasive ductal carcinomas of breast. , 2010, Experimental oncology.

[16]  Dana C Crawford,et al.  Evidence for substantial fine-scale variation in recombination rates across the human genome , 2004, Nature Genetics.

[17]  J. Hartigan,et al.  The Dip Test of Unimodality , 1985 .

[18]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[19]  Pardis C Sabeti,et al.  Linkage disequilibrium in the human genome , 2001, Nature.

[20]  Francesc Calafell,et al.  A New Method to Reconstruct Recombination Events at a Genomic Scale , 2010, PLoS Comput. Biol..

[21]  Lars Fugger,et al.  MHC class II proteins and disease: a structural perspective , 2006, Nature Reviews Immunology.

[22]  Joshua T. Burdick,et al.  Polymorphic variation in human meiotic recombination. , 2007, American journal of human genetics.

[23]  Marek Kimmel,et al.  Forward-Time Simulations of Human Populations with Complex Diseases , 2007, PLoS genetics.

[24]  P. Donnelly,et al.  Drive Against Hotspot Motifs in Primates Implicates the PRDM9 Gene in Meiotic Recombination , 2010, Science.

[25]  Predicting DNA sequence motifs of recombination hotspots by integrative visualization and analysis , 2012 .

[26]  Francesc Calafell,et al.  IRiS: Construction of ARG networks at genomic scales , 2011, Bioinform..

[27]  C. Croce,et al.  Twenty-seven nonoverlapping zinc finger cDNAs from human T cells map to nine different chromosomes with apparent clustering. , 1991, American journal of human genetics.

[28]  Francesc Calafell,et al.  Estimating the Ancestral Recombinations Graph (ARG) as Compatible Networks of SNP Patterns , 2008, J. Comput. Biol..

[29]  Chee Keong Kwoh,et al.  Reliable and fast estimation of recombination rates by convergence diagnosis and parallel Markov Chain Monte Carlo , 2014, TCBB.

[30]  Chee Keong Kwoh,et al.  LDsplit: screening for cis-regulatory motifs stimulating meiotic recombination hotspots by analysis of DNA sequence polymorphisms , 2014, BMC Bioinformatics.

[31]  K. Paigen,et al.  Prdm9 Controls Activation of Mammalian Recombination Hotspots , 2010, Science.

[32]  G. Coop,et al.  Live Hot, Die Young: Transmission Distortion in Recombination Hotspots , 2007, PLoS genetics.

[33]  Norman Arnheim,et al.  High resolution localization of recombination hot spots using sperm typing , 1994, Nature Genetics.

[34]  G. Coop,et al.  PRDM9 Is a Major Determinant of Meiotic Recombination Hotspots in Humans and Mice , 2010, Science.

[35]  Thiesen Hj,et al.  Multiple genes encoding zinc finger domains are expressed in human T cells. , 1990 .

[36]  G. Coop,et al.  High-Resolution Mapping of Crossovers Reveals Extensive Variation in Fine-Scale Recombination Patterns Among Humans , 2008, Science.

[37]  Brendan D. O'Fallon,et al.  ACG: rapid inference of population history from recombining nucleotide sequences , 2013, BMC Bioinformatics.

[38]  Peter Donnelly,et al.  A common sequence motif associated with recombination hot spots and genome instability in humans , 2008, Nature Genetics.

[39]  C. Kwoh,et al.  Reliable and Fast Estimation of Recombination Rates by Convergence Diagnosis and Parallel Markov Chain Monte Carlo , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  W. Stemmer,et al.  Genome shuffling leads to rapid phenotypic improvement in bacteria , 2002, Nature.