Space-efficient merging of succinct de Bruijn graphs

We propose a new algorithm for merging succinct representations of de Bruijn graphs introduced in [Bowe et al. WABI 2012]. Our algorithm is based on the lightweight BWT merging approach by Holt and McMillan [Bionformatics 2014, ACM-BCB 2014]. Our algorithm has the same asymptotic cost of the state of the art tool for the same problem presented by Muggli et al. [bioRxiv 2017, Bioinformatics 2019], but it uses less than half of its working space. A novel important feature of our algorithm, not found in any of the existing tools, is that it can compute the Variable Order succinct representation of the union graph within the same asymptotic time/space bounds.

[1]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Christina Boucher,et al.  Recoloring the Colored de Bruijn Graph , 2018, SPIRE.

[3]  Giovanni Manzini,et al.  Lightweight merging of compressed indices based on BWT variants , 2019, Theor. Comput. Sci..

[4]  Christina Boucher,et al.  Succinct Colored de Bruijn Graphs , 2016, bioRxiv.

[5]  Kunihiko Sadakane,et al.  Succinct de Bruijn Graphs , 2012, WABI.

[6]  Christina Boucher,et al.  Succinct De Bruijn Graph Construction for Massive Populations Through Space-Efficient Merging , 2017, bioRxiv.

[7]  Christina Boucher,et al.  Variable-Order de Bruijn Graphs , 2014, 2015 Data Compression Conference.

[8]  Marco Previtali,et al.  Bidirectional Variable-Order de Bruijn Graphs , 2016, LATIN.

[9]  Prashant Pandey,et al.  Rainbowfish: A Succinct Colored de Bruijn Graph Representation , 2017, bioRxiv.

[10]  Leonard McMillan,et al.  Constructing burrows-wheeler transforms of large string collections via merging , 2014, BCB.

[11]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets , 2007, ACM Trans. Algorithms.

[12]  Michael C. Schatz,et al.  SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips , 2014, Bioinform..

[13]  Giovanni Manzini,et al.  Lightweight BWT and LCP Merging via the Gap Algorithm , 2017, SPIRE.

[14]  G. McVean,et al.  De novo assembly and genotyping of variants using colored de Bruijn graphs , 2011, Nature Genetics.

[15]  Leonard McMillan,et al.  Merging of multi-string BWTs with applications , 2014, Bioinform..

[16]  Gonzalo Navarro,et al.  Compressed representations of sequences and full-text indexes , 2007, TALG.

[17]  Guilherme P. Telles,et al.  Inducing enhanced suffix arrays for string collections , 2017, Theor. Comput. Sci..

[18]  Christina Boucher,et al.  Building large updatable colored de Bruijn graphs via merging , 2019, Bioinform..

[19]  Giovanni Manzini,et al.  External memory BWT and LCP computation for sequence collections with applications , 2019, Algorithms for molecular biology : AMB.