DeepSignal: detecting DNA methylation state from Nanopore sequencing reads using deep-learning.

MOTIVATION The Oxford Nanopore sequencing enables to directly detect methylation states of bases in DNA from reads without extra laboratory techniques. Novel computational methods are required to improve the accuracy and robustness of DNA methylation state prediction using Nanopore reads. RESULTS In this study, we develop DeepSignal, a deep learning method to detect DNA methylation states from Nanopore sequencing reads. Testing on Nanopore reads of Homo sapiens (H. sapiens), Escherichia coli (E. coli) and pUC19 shows that DeepSignal can achieve higher performance at both read level and genome level on detecting 6mA and 5mC methylation states comparing to previous HMM based methods. DeepSignal achieves similar performance cross different DNA methylation bases, different DNA methylation motifs, and both singleton and mixed DNA CpG. Moreover, DeepSignal requires much lower coverage than those required by HMM and statistics based methods. DeepSignal can achieve 90% above accuracy for detecting 5mC and 6mA using only 2x coverage of reads. Furthermore, for DNA CpG methylation state prediction, DeepSignal achieves 90% correlation with bisulfite sequencing using just 20x coverage of reads, which is much better than HMM based methods. Especially, DeepSignal can predict methylation states of 5% more DNA CpGs that previously cannot be predicted by bisulfite sequencing. DeepSignal can be a robust and accurate method for detecting methylation states of DNA bases. AVAILABILITY DeepSignal is publicly available at https://github.com/bioinfomaticsCSU/deepsignal. SUPPLEMENTARY INFORMATION Supplementary data are available at bioinformatics online.

[1]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.

[2]  Felix Krueger,et al.  Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications , 2011, Bioinform..

[3]  Winston Timp,et al.  Detecting DNA cytosine methylation using nanopore sequencing , 2017, Nature Methods.

[4]  Mark Akeson,et al.  Error rates for nanopore discrimination among cytosine, methylcytosine, and hydroxymethylcytosine along individual DNA strands , 2013, Proceedings of the National Academy of Sciences.

[5]  Chuan He,et al.  RNA N6-methyladenosine methylation in post-transcriptional gene expression regulation , 2015, Genes & development.

[6]  Satinderjit Singh,et al.  An Alternate Algorithm for (3x3) Median Filtering of Digital Images , 2012, BIOINFORMATICS 2012.

[7]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jordan M. Eizenga,et al.  Mapping DNA Methylation with High Throughput Nanopore Sequencing , 2017, Nature Methods.

[9]  Tyson A. Clark,et al.  Direct detection of DNA methylation during single-molecule, real-time sequencing , 2010, Nature Methods.

[10]  S. Gonzalo,et al.  Epigenetic alterations in aging. , 2010, Journal of applied physiology.

[11]  Minghui He,et al.  N6-Methyladenine DNA Modification in the Human Genome. , 2018, Molecular cell.

[12]  Tianlei Xu,et al.  Active N6-Methyladenine Demethylation by DMAD Regulates Gene Expression by Coordinating with Polycomb Protein in Neurons. , 2018, Molecular cell.

[13]  D. Egli,et al.  NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data , 2019, BMC Genomics.

[14]  Gintaras Deikus,et al.  Mapping and characterizing N6-methyladenine in eukaryotic genomes using single-molecule real-time sequencing , 2018, Genome research.

[15]  Jiming Jiang,et al.  Epigenetic Modification of Centromeric Chromatin: Hypomethylation of DNA Sequences in the CENH3-Associated Chromatin in Arabidopsis thaliana and Maize[W][OA] , 2008, The Plant Cell Online.

[16]  Michael C Schatz,et al.  Nanopore sequencing meets epigenetics , 2017, Nature Methods.

[17]  Matthew K Waldor,et al.  Entering the era of bacterial epigenomics with single molecule real time DNA sequencing. , 2013, Current opinion in microbiology.

[18]  I. Derrington,et al.  Detection and mapping of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore MspA , 2013, Proceedings of the National Academy of Sciences.

[19]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[22]  Esteban Ballestar,et al.  DNA Methylation Polymorphisms Precede Any Histological Sign of Atherosclerosis in Mice Lacking Apolipoprotein E* , 2004, Journal of Biological Chemistry.

[23]  David Haussler,et al.  The UCSC Genome Browser database: 2018 update , 2017, Nucleic Acids Res..

[24]  Koichiro Doi,et al.  Centromere evolution and CpG methylation during vertebrate speciation , 2017, Nature Communications.

[25]  D. Schübeler Function and information content of DNA methylation , 2015, Nature.

[26]  Howard Cedar,et al.  DNA methylation dynamics in health and disease , 2013, Nature Structural &Molecular Biology.

[27]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[28]  Zachary D. Smith,et al.  DNA methylation: roles in mammalian development , 2013, Nature Reviews Genetics.