Detecting epigenetic motifs in low coverage and metagenomics settings

BackgroundIt has recently become possible to rapidly and accurately detect epigenetic signatures in bacterial genomes using third generation sequencing data. Monitoring the speed at which a single polymerase inserts a base in the read strand enables one to infer whether a modification is present at that specific site on the template strand. These sites can be challenging to detect in the absence of high coverage and reliable reference genomes.MethodsHere we provide a new method for detecting epigenetic motifs in bacteria on datasets with low-coverage, with incomplete references, and with mixed samples (i.e. metagenomic data). Our approach treats motif inference as a kmer comparison problem. First, genomes (or contigs) are deconstructed into kmers. Then, native genome-wide distributions of interpulse durations (IPDs) for kmers are compared with corresponding whole genome amplified (WGA, modification free) IPD distributions using log likelihood ratios. Finally, kmers are ranked and greedily selected by iteratively correcting for sequences within a particular kmer's neighborhood.ConclusionsOur method can detect multiple types of modifications, even at very low-coverage and in the presence of mixed genomes. Additionally, we are able to predict modified motifs when genomes with "neighbor" modified motifs exist within the sample. Lastly, we show that these motifs can provide an alternative source of information by which to cluster metagenomics contigs and that iterative refinement on these clustered contigs can further improve both sensitivity and specificity of motif detection.Availabilityhttps://github.com/alibashir/EMMCKmer

[1]  S. Salzberg,et al.  Using MUMmer to Identify Similar Regions in Large Sequence Sets , 2003, Current protocols in bioinformatics.

[2]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[3]  Gang Fang,et al.  Detecting DNA Modifications from SMRT Sequencing Data by Modeling Sequence Context Dependence of Polymerase Kinetic , 2013, PLoS Comput. Biol..

[4]  J. Elliman,et al.  Bacteriophage adenine methyltransferase: a life cycle regulator? Modelled using Vibrio harveyi myovirus like , 2012, Journal of applied microbiology.

[5]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[6]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[7]  Tyson A. Clark,et al.  data to detect putative modifications to DNA bases Modeling kinetic rate variation in third generation DNA sequencing , 2012 .

[8]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[9]  Yasubumi Sakakibara,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2012, Nucleic acids research.

[10]  Tyson A. Clark,et al.  Comprehensive Methylome Characterization of Mycoplasma genitalium and Mycoplasma pneumoniae at Single-Base Resolution , 2013, PLoS genetics.

[11]  S. Schuster,et al.  Integrative analysis of environmental sequences using MEGAN4. , 2011, Genome research.

[12]  Eleazar Eskin,et al.  Finding composite regulatory patterns in DNA sequences , 2002, ISMB.

[13]  T. Mikkelsen,et al.  Genome-scale DNA methylation maps of pluripotent and differentiated cells , 2008, Nature.

[14]  Shoudan Liang,et al.  cWINNOWER algorithm for finding fuzzy DNA motifs , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[15]  A. Jeltsch Phylogeny of Methylomes , 2010, Science.

[16]  W. Reik,et al.  Genomic imprinting: parental influence on the genome , 2001, Nature Reviews Genetics.

[17]  Tyson A. Clark,et al.  Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing , 2012, Nature Biotechnology.

[18]  M. Schatz,et al.  Hybrid error correction and de novo assembly of single-molecule sequencing reads , 2012, Nature Biotechnology.

[19]  Detecting DNA Base Modifications Using Single Molecule , Real-Time Sequencing , 2012 .

[20]  James H. Bullard,et al.  A hybrid approach for the automated finishing of bacterial genomes , 2012, Nature Biotechnology.

[21]  BMC Bioinformatics , 2005 .

[22]  V. V. Zinoviev,et al.  Study of Bacteriophage T4-encoded Dam DNA (Adenine-N6)-methyltransferase Binding with Substrates by Rapid Laser UV Cross-linking* , 2007, Journal of Biological Chemistry.

[23]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[24]  C. Nusbaum,et al.  Finished bacterial genomes from shotgun sequence data , 2012, Genome research.

[25]  S. Vogt,et al.  Degenerative diseases, oxidative stress and cytochrome c oxidase function. , 2009, Trends in molecular medicine.

[26]  M. Dizdaroglu Oxidatively induced DNA damage: mechanisms, repair and disease. , 2012, Cancer letters.

[27]  Vladimir Benes,et al.  Genomics of DNA cytosine methylation in Escherichia coli reveals its role in stationary phase transcription , 2012, Nature Communications.

[28]  Aaron A. Klammer,et al.  Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data , 2013, Nature Methods.

[29]  D. Wion,et al.  N6-methyl-adenine: an epigenetic signal for DNA–protein interactions , 2006, Nature Reviews Microbiology.

[30]  Richard J. Roberts,et al.  The methylomes of six bacteria , 2012, Nucleic acids research.

[31]  M. Marinus,et al.  Roles of DNA adenine methylation in host-pathogen interactions: mismatch repair, transcriptional regulation, and more. , 2009, FEMS microbiology reviews.

[32]  Tyson A. Clark,et al.  Direct detection of DNA methylation during single-molecule, real-time sequencing , 2010, Nature Methods.