MBGC: Multiple Bacteria Genome Compressor

Summary Genomes within the same species reveal large similarity, exploited by specialized multiple genome compressors. The existing algorithms and tools are however targeted at large, e.g., mammalian, genomes, and their performance on bacteria strains is mediocre. In this work, we propose MBGC, a specialized genome compressor making use of specific redundancy of bacterial genomes. Our tool is not only compression efficient, but also fast. On a collection of 168,311 bacterial genomes, totalling 587 GB, we achieve the compression ratio around the factor of 730, and the compression (resp. decompression) speed around 1070 MB/s (resp. 740 MB/s) using 8 hardware threads, on a computer with a 6-core / 12-thread CPU and a fast SSD, being about 4 times more succinct and more than an order of magnitude faster in the compression than our main competitors. Availability and implementation MBGC is freely available at github.com/kowallus/mbgc.