Fibertools: fast and accurate DNA-m6A calling using single-molecule long-read sequencing

Single-molecule chromatin fiber sequencing is based on the single-nucleotide resolution identification of DNA N6-methyladenine (m6A) along individual sequencing reads. We present fibertools, a semi-supervised convolutional neural network that permits the fast and accurate identification of both endogenous and exogenous m6A-marked bases using single-molecule long-read sequencing. Fibertools enables highly accurate (>90% precision and recall) m6A identification along multi-kilobase DNA molecules with a ∼1,000-fold improvement in speed and the capacity to generalize to new sequencing chemistries.

[1]  A. Stergachis,et al.  Evaluation of N6-methyldeoxyadenosine antibody-based genomic profiling in eukaryotes. , 2023, Genome research.

[2]  A. Stergachis,et al.  Single-molecule architecture and heterogeneity of human telomeric DNA and chromatin , 2022, bioRxiv.

[3]  R. Sebra,et al.  Critical assessment of DNA adenine methylation in eukaryotes using quantitative deconvolution , 2022, Science.

[4]  Hideki Tanizawa,et al.  The three-dimensional structure of Epstein-Barr virus genome varies by latency type and is regulated by PARP1 enzymatic activity , 2022, Nature Communications.

[5]  L. Mirny,et al.  Dynamics of CTCF and cohesin mediated chromatin looping revealed by live-cell imaging , 2021, bioRxiv.

[6]  Aaron M. Streets,et al.  The complete sequence of a human genome , 2021, bioRxiv.

[7]  William Stafford Noble,et al.  mokapot: Fast and Flexible Semisupervised Learning for Peptide Detection , 2021, Journal of proteome research.

[8]  P. Jiang,et al.  Genome-wide detection of cytosine methylation by single molecule real-time sequencing , 2021, Proceedings of the National Academy of Sciences.

[9]  J. Stamatoyannopoulos,et al.  Single-molecule regulatory architectures captured by chromatin fiber sequencing , 2020, Science.

[10]  J. Underwood,et al.  Massively multiplex single-molecule oligonucleosome footprinting , 2020, bioRxiv.

[11]  Nicholas A. Sinnott-Armstrong,et al.  Long-range single-molecule mapping of chromatin accessibility in eukaryotes , 2020, Nature Methods.

[12]  J. Simpson,et al.  Simultaneous profiling of chromatin accessibility and methylation on human cell lines with nanopore sequencing , 2018, Nature Methods.

[13]  Aaron M Wenger,et al.  Improved assembly and variant detection of a haploid human genome using single‐molecule, high‐fidelity long reads , 2019, Annals of human genetics.

[14]  Sergey Koren,et al.  Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome , 2019, Nature Biotechnology.

[15]  Zev N. Kronenberg,et al.  Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads , 2019, bioRxiv.

[16]  Nicholas A. Sinnott-Armstrong,et al.  Long-range single-molecule mapping of chromatin accessibility in eukaryotes , 2018, Nature Methods.

[17]  Sven Rahmann,et al.  Genome analysis , 2022 .

[18]  Jacob Schreiber,et al.  Pomegranate: fast and flexible probabilistic modeling in python , 2017, J. Mach. Learn. Res..

[19]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[20]  N. Loman,et al.  A complete bacterial genome assembled de novo using only nanopore sequencing data , 2015, Nature Methods.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Richard J. Roberts,et al.  The methylomes of six bacteria , 2012, Nucleic acids research.

[23]  Christina Leslie,et al.  An atlas of the Epstein-Barr virus transcriptome and epigenome reveals host-virus regulatory interactions. , 2012, Cell host & microbe.

[24]  Richard J. Roberts,et al.  Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing , 2011, Nucleic acids research.

[25]  Tyson A. Clark,et al.  Direct detection of DNA methylation during single-molecule, real-time sequencing , 2010, Nature Methods.

[26]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[27]  Michelle D. Wang,et al.  High resolution dynamic mapping of histone-DNA interactions in a nucleosome , 2008, Nature Structural &Molecular Biology.

[28]  William Stafford Noble,et al.  Semi-supervised learning for peptide identification from shotgun proteomics datasets , 2007, Nature Methods.

[29]  J. Widom,et al.  Sequence and position-dependence of the equilibrium accessibility of nucleosomal DNA target sites. , 2000, Journal of molecular biology.

[30]  T. Richmond,et al.  Crystal structure of the nucleosome core particle at 2.8 Å resolution , 1997, Nature.

[31]  J. Widom,et al.  Mechanism of protein access to specific DNA sequences in chromatin: a dynamic equilibrium model for gene regulation. , 1995, Journal of molecular biology.

[32]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .