H3M2: detection of runs of homozygosity from whole-exome sequencing data

MOTIVATION Runs of homozygosity (ROH) are sizable chromosomal stretches of homozygous genotypes, ranging in length from tens of kilobases to megabases. ROHs can be relevant for population and medical genetics, playing a role in predisposition to both rare and common disorders. ROHs are commonly detected by single nucleotide polymorphism (SNP) microarrays, but attempts have been made to use whole-exome sequencing (WES) data. Currently available methods developed for the analysis of uniformly spaced SNP-array maps do not fit easily to the analysis of the sparse and non-uniform distribution of the WES target design. RESULTS To meet the need of an approach specifically tailored to WES data, we developed [Formula: see text], an original algorithm based on heterogeneous hidden Markov model that incorporates inter-marker distances to detect ROH from WES data. We evaluated the performance of [Formula: see text] to correctly identify ROHs on synthetic chromosomes and examined its accuracy in detecting ROHs of different length (short, medium and long) from real 1000 genomes project data. [Formula: see text] turned out to be more accurate than GERMLINE and PLINK, two state-of-the-art algorithms, especially in the detection of short and medium ROHs. AVAILABILITY AND IMPLEMENTATION [Formula: see text] is a collection of bash, R and Fortran scripts and codes and is freely available at https://sourceforge.net/projects/h3m2/. CONTACT albertomagi@gmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[2]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[3]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[4]  E S Lander,et al.  Homozygosity mapping: a way to map human recessive traits with the DNA of inbred children. , 1987, Science.

[5]  Brian L Browning,et al.  Detecting identity by descent and estimating genotype error rates in sequence data. , 2013, American journal of human genetics.

[6]  Peter Nürnberg,et al.  A Systematic Approach to Mapping Recessive Disease Genes in Individuals from Outbred Populations , 2009, PLoS genetics.

[7]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[8]  M. Krawczak,et al.  Genomic and geographic distribution of SNP-defined runs of homozygosity in Europeans. , 2010, Human molecular genetics.

[9]  Richard M Myers,et al.  Genomic patterns of homozygosity in worldwide human populations. , 2012, American journal of human genetics.

[10]  Alexander Gusev,et al.  Whole population, genome-wide mapping of hidden relatedness. , 2009, Genome research.

[11]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[12]  Jurg Ott,et al.  Genome‐wide autozygosity mapping in human populations , 2009, Genetic epidemiology.

[13]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[14]  B. Giusti,et al.  EXCAVATOR: detecting copy number variants from whole-exome sequencing data , 2013, Genome Biology.

[15]  Harry Campbell,et al.  Genomic Runs of Homozygosity Record Population History and Consanguinity , 2010, PloS one.

[16]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[17]  Igor Rudan,et al.  Runs of homozygosity in European populations. , 2008, American journal of human genetics.

[18]  Judy H. Cho,et al.  Detecting Identity by Descent and Homozygosity Mapping in Whole-Exome Sequencing Data , 2012, PloS one.

[19]  Pardis C Sabeti,et al.  Genome-wide detection and characterization of positive selection in human populations , 2007, Nature.

[20]  John Novembre,et al.  Global distribution of genomic diversity underscores rich complex history of continental human populations. , 2009, Genome research.

[21]  R. Casadio,et al.  EX-HOM (EXome HOMozygosity): A Proof of Principle , 2011, Human Heredity.

[22]  Matthew C Keller,et al.  Detecting autozygosity through runs of homozygosity: A comparison of three autozygosity detection algorithms , 2011, BMC Genomics.

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .