A mostly traditional approach improves alignment of bisulfite-converted DNA

Cytosines in genomic DNA are sometimes methylated. This affects many biological processes and diseases. The standard way of measuring methylation is to use bisulfite, which converts unmethylated cytosines to thymines, then sequence the DNA and compare it to a reference genome sequence. We describe a method for the critical step of aligning the DNA reads to the correct genomic locations. Our method builds on classic alignment techniques, including likelihood-ratio scores and spaced seeds. In a realistic benchmark, our method has a better combination of sensitivity, specificity and speed than nine other high-throughput bisulfite aligners. This study enables more accurate and rational analysis of DNA methylation. It also illustrates how to adapt general-purpose alignment methods to a special case with distorted base patterns: this should be informative for other special cases such as ancient DNA and AT-rich genomes.

[1]  Yoshihisa Watanabe,et al.  Methylation of DNA in cancer. , 2010, Advances in clinical chemistry.

[2]  B. Vanyushin,et al.  DNA methylation in higher plants: past, present and future. , 2011, Biochimica et biophysica acta.

[3]  M. Frith,et al.  Adaptive seeds tame genomic sequence comparison. , 2011, Genome research.

[4]  Nam-Kyung Yu,et al.  DNA methylation-mediated control of learning and memory , 2011, Molecular Brain.

[5]  Martin C. Frith,et al.  Gentle Masking of Low-Complexity Sequences Improves Homology Search , 2011, PloS one.

[6]  Christopher A. Miller,et al.  Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing , 2010, BMC Bioinformatics.

[7]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[8]  Lee E. Edsall,et al.  Human DNA methylomes at base resolution show widespread epigenomic differences , 2009, Nature.

[9]  M. Dragunow,et al.  Epigenetics in Alzheimer's disease: a focus on DNA modifications. , 2011, Current pharmaceutical design.

[10]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[11]  Pao-Yang Chen,et al.  BS Seeker: precise mapping for bisulfite sequencing , 2010, BMC Bioinformatics.

[12]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[13]  M. Frith,et al.  Incorporating sequence quality data into alignment improves DNA read mapping , 2010, Nucleic acids research.

[14]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[15]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[16]  Brent Pedersen,et al.  MethylCoder: software pipeline for bisulfite-treated sequences , 2011, Bioinform..

[17]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[18]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[19]  Wei Li,et al.  BSMAP: whole genome bisulfite sequence MAPping program , 2009, BMC Bioinformatics.

[20]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[21]  Mary Goldman,et al.  The UCSC Genome Browser database: update 2011 , 2010, Nucleic Acids Res..

[22]  Philip L. F. Johnson,et al.  Patterns of damage in genomic DNA sequences from a Neandertal , 2007, Proceedings of the National Academy of Sciences.

[23]  Gregory Kucherov,et al.  A unifying framework for seed sensitivity and its application to subset seeds , 2006, J. Bioinform. Comput. Biol..

[24]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[25]  Michael Q. Zhang,et al.  Updates to the RMAP short-read mapping software , 2009, Bioinform..

[26]  Kiyoshi Asai,et al.  Probabilistic alignments with quality scores: an application to short-read mapping toward accurate SNP/indel detection , 2011, Bioinform..

[27]  F. Lyko,et al.  Epigenetic cancer therapy: Proof of concept and remaining challenges , 2010, BioEssays : news and reviews in molecular, cellular and developmental biology.

[28]  Stefano Lonardi,et al.  BRAT: bisulfite-treated reads analysis tool , 2010, Bioinform..