DNAModAnnot: a R toolbox for DNA modification filtering and annotation

Abstract Motivation Long-read sequencing technologies can be employed to detect and map DNA modifications at the nucleotide resolution on a genome-wide scale. However, published software packages neglect the integration of genomic annotation and comprehensive filtering when analyzing patterns of modified bases detected using Pacific Biosciences (PacBio) or Oxford Nanopore Technologies (ONT) data. Here, we present DNA Modification Annotation (DNAModAnnot), a R package designed for the global analysis of DNA modification patterns using adapted filtering and visualization tools. Results We tested our package using PacBio sequencing data to analyze patterns of the 6-methyladenine (6mA) in the ciliate Paramecium tetraurelia, in which high 6mA amounts were previously reported. We found P. tetraurelia 6mA genome-wide distribution to be similar to other ciliates. We also performed 5-methylcytosine (5mC) analysis in human lymphoblastoid cells using ONT data and confirmed previously known patterns of 5mC. DNAModAnnot provides a toolbox for the genome-wide analysis of different DNA modifications using PacBio and ONT long-read sequencing data. Availability and implementation DNAModAnnot is distributed as a R package available via GitHub (https://github.com/AlexisHardy/DNAModAnnot). Supplementary information Supplementary data are available at Bioinformatics online.

[1]  Shan Gao,et al.  N6-adenine DNA methylation is associated with the linker DNA of H2A.Z-containing well-positioned nucleosomes in Pol II-transcribed genes in Tetrahymena , 2017, Nucleic acids research.

[2]  Tao Liu,et al.  Sources of artifact in measurements of 6mA and 4mC abundance in eukaryotic genomic DNA , 2019, BMC Genomics.

[3]  A. Keniry,et al.  Latest techniques to study DNA methylation , 2019, Essays in biochemistry.

[4]  R. Sebra,et al.  Identification of a DNA N6-Adenine Methyltransferase Complex and Its Impact on Chromatin Organization , 2019, Cell.

[5]  Florian Hahne,et al.  Visualizing Genomic Data Using Gviz and Bioconductor , 2016, Statistical Genomics.

[6]  Wouter De Coster,et al.  Methplotlib: analysis of modified nucleotides from nanopore sequencing , 2020, Bioinformatics.

[7]  J. M. Goddard,et al.  Methylated bases in DNA from Paramecium aurelia. , 1974, Biochimica et biophysica acta.

[8]  Luke Zappia,et al.  Opportunities and challenges in long-read sequencing data analysis , 2020, Genome Biology.

[9]  Gintaras Deikus,et al.  Mapping and characterizing N6-methyladenine in eukaryotic genomes using single-molecule real-time sequencing , 2018, Genome research.

[10]  A. Bird,et al.  CpG islands and the regulation of transcription. , 2011, Genes & development.

[11]  Peter L Molloy,et al.  De novo identification of differentially methylated regions in the human genome , 2015, Epigenetics & Chromatin.

[12]  Brent S. Pedersen,et al.  Nanopore sequencing and assembly of a human genome with ultra-long reads , 2017, Nature Biotechnology.