An iterative algorithm for de novo optical map assembly

Optical mapping is a high-throughput sequencing technology which carries long-range genome information at no risk of PCR artifacts. On account of its long span, optical maps leave much fewer gaps when used for genome assembly. However, a high risk of errors poses an enormous challenge to optical map assembly. Here we propose an iterative algorithm for de novo optical map assembly. In any iteration, only significant pairwise alignments beyond strict thresholds are used to construct accurate contigs. These contigs act as input molecules for the next iteration of assembly. Strict thresholds ensures a good quality of the local assembly. The iterative method retains the connectivity between contigs in a progressive manner. In practice, our IOMA (iterative optical map assembler) outperforms two popular assemblers being used in the community on both simulated and real E. coli datasets.

[1]  Yi Yang,et al.  Alignment of Optical Maps , 2005, RECOMB.

[2]  Ming Xiao,et al.  Towards a More Accurate Error Model for BioNano Optical Maps , 2016, ISBRA.

[3]  Joshua Udall,et al.  OMWare: a tool for efficient assembly of genome-wide physical maps , 2016, BMC Bioinformatics.

[4]  Xun Xu,et al.  Rapid detection of structural variation in a human genome using nanochannel-based genome mapping technology , 2014, GigaScience.

[5]  David C. Schwartz,et al.  Genomics via Optical Mapping III: Contiging Genomic DNA , 1998, ISMB.

[6]  David C. Schwartz,et al.  Statistical Significance of Optical Map Alignments , 2012, J. Comput. Biol..

[7]  David C. Schwartz,et al.  An algorithm for assembly of ordered restriction maps from single DNA molecules , 2006, Proceedings of the National Academy of Sciences.

[8]  Stephane Rombauts,et al.  OMSim: a simulator for optical map data , 2017, Bioinform..

[9]  Ming Xiao,et al.  OMBlast: alignment tool for optical mapping using a seed-and-extend approach , 2016, Bioinform..

[10]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[11]  Siu-Ming Yiu,et al.  IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler , 2010, RECOMB.

[12]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[13]  X Huang,et al.  An O (N2 log N) restriction map comparison and search algorithm. , 1992, Bulletin of mathematical biology.

[14]  Kevin Y. Yip,et al.  Genome-Wide Structural Variation Detection by Genome Mapping on Nanochannel Arrays , 2015, Genetics.

[15]  Siu-Ming Yiu,et al.  OMSV enables accurate and comprehensive identification of large structural variations from nanochannel-based single-molecule optical maps , 2017 .

[16]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[17]  Michael S. Waterman,et al.  Algorithms for restriction map comparisons , 1984, Nucleic Acids Res..