Haplotype-centered mapping for improved alignments and genetic association studies

Next-Generation Sequencing experiments have been used to identify genotypes that are associated with many medical conditions. An important part of Next Generation read processing is the mapping of short reads to a reference genome. Although many algorithms have been created to perform this mapping, there are many reads that cannot be mapped because they are sequenced from low complexity regions of the genome (repeat regions) or from regions that are divergent from the reference genome. This research shows that when reads are first assembled into longer contigs that are then mapped to the reference genome, mapping efficiency and accuracy increases. When two contigs map to the same location, the contigs can provide haplotype information that can be used to perform association studies based on phased SNPs on a haplotype.

[1]  Gabor T. Marth,et al.  MOSAIK: A Hash-Based Algorithm for Accurate Next-Generation Sequencing Short-Read Mapping , 2013, PloS one.

[2]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[3]  Martin Goodson,et al.  Stampy: a statistical algorithm for sensitive and fast mapping of Illumina sequence reads. , 2011, Genome research.

[4]  Paul Scheet,et al.  A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. , 2006, American journal of human genetics.

[5]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[6]  Timothy B. Stockwell,et al.  Evaluation of next generation sequencing platforms for population targeted sequencing studies , 2009, Genome Biology.

[7]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[8]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[9]  Andrew P Morris,et al.  Basic statistical analysis in genetic case-control studies , 2011, Nature Protocols.

[10]  P. Crane,et al.  Alzheimer’s Disease: Analyzing the Missing Heritability , 2013, PloS one.

[11]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[12]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[13]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[14]  M. Stephens,et al.  Accounting for Decay of Linkage Disequilibrium in Haplotype Inference and Missing-data Imputation , 2022 .

[15]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[16]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[17]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[18]  Mark J. Clement,et al.  The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing , 2010, Bioinform..

[19]  Guillaume Rizk,et al.  Whole genome re-sequencing : lessons from unmapped reads , 2013 .

[20]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[21]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[22]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[23]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[24]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[25]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[26]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[27]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[28]  Lucian Ilie,et al.  SHRiMP2: Sensitive yet Practical Short Read Mapping , 2011, Bioinform..

[29]  Mark T. W. Ebbert,et al.  Genetics of Alzheimer's Disease , 2013, BioMed research international.

[30]  O. Delaneau,et al.  Supplementary Information for ‘ Improved whole chromosome phasing for disease and population genetic studies ’ , 2012 .

[31]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[32]  Leping Li,et al.  ART: a next-generation sequencing read simulator , 2012, Bioinform..

[33]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[34]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.