The Polygraph: A Data Structure for Genome Alignment and Variation Detection

Comparing whole genomes and finding variation is an important and difficult bioinformatic task. We present the Polygraph, a data structure for referencefree, multiple whole genome alignment that can be used to identify genomic structural variation. This data structure is built from assembled genomes and preserves the genomic structure from the assembly. It avoids the “hairball” graph structure that can occur in other graph methods such as de Bruijn graphs. The Polygraph can easily be visualized and be used for identification of structural variants. We apply the Polygraph to Escherichia coli and Saccharomyces cerevisiae for finding Structural Variants. keywords: genome alignment, comparative genomics, graph, homology, structural variants

[1]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[2]  N. Perna,et al.  progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement , 2010, PloS one.

[3]  Steven Salzberg,et al.  Mugsy: fast multiple alignment of closely related whole genomes , 2010, Bioinform..

[4]  Brian D. Ondov,et al.  The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes , 2014, Genome Biology.

[5]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[6]  Jens Stoye,et al.  Bloom Filter Trie: an alignment-free and reference-free data structure for pan-genome storage , 2016, Algorithms for Molecular Biology.

[7]  Mark J. Clement,et al.  Genome Polymorphism Detection Through Relaxed de Bruijn Graph Construction , 2017, 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE).

[8]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[9]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[10]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[11]  Brigitte Cambon,et al.  Eukaryote-to-eukaryote gene transfer events revealed by the genome sequence of the wine yeast Saccharomyces cerevisiae EC1118 , 2009, Proceedings of the National Academy of Sciences.

[12]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[13]  Olga Golosova,et al.  Unipro UGENE: a unified bioinformatics toolkit , 2012, Bioinform..