Pan-Genomes and de Bruijn Graphs Seminar Report

This report is based on the paper “Graphical pan-genome analysis with compressed suffix trees and the Burrows–Wheeler transform” (Baier et al. 2016). The pan-genome of a population is a collection of genomic sequences of individuals in this population as well as genetic variations. Marcus et al. (2014) proposed the compressed de Bruijn graph as a suitable datastructure for the pan-genome and introduced the splitMEM algorithm to construct this graph. Baier et al. (2016) improved the splitMEM algorithm and developed two algorithms that outperformed splitMEM significantly. Ilia Minkin et al. (2016) devised a scalable, low-memory algorithm, called TwoPaCo, that was even more efficient.