PanCake: A Data Structure for Pangenomes

We present a pangenome data structure ("PanCake") for sets of related genomes, based on bundling similar sequence regions into shared features, which are derived from genome-wide pairwise sequence alignments. We discuss the design of the data structure, basic operations on it and methods to predict core genomes and singleton regions. In contrast to many other pangenome analysis tools, like EDGAR or PGAT, PanCake is independent of gene annotations. Nevertheless, comparison of identified core and singleton regions shows good agreements. The PanCake data structure requires significantly less space than the sum of individual sequence files.

[1]  Pierre Baldi,et al.  Data structures and compression algorithms for genomic sequence data , 2009, Bioinform..

[2]  Owen White,et al.  The Comprehensive Microbial Resource , 2001, Nucleic Acids Res..

[3]  H. Tettelin,et al.  The microbial pan-genome. , 2005, Current opinion in genetics & development.

[4]  Jun Yu,et al.  PGAP: pan-genomes analysis pipeline , 2011, Bioinform..

[5]  J. Gogarten,et al.  Using comparative genome analysis to identify problems in annotated microbial genomes. , 2010, Microbiology.

[6]  Jens Stoye,et al.  Multiple genome comparison based on overlap regions of pairwise local alignments , 2012, BMC Bioinformatics.

[7]  Alexander Goesmann,et al.  EDGAR: A software framework for the comparative analysis of prokaryotic genomes , 2009, BMC Bioinformatics.

[8]  Eric Rivals,et al.  An alternative approach to multiple genome comparison , 2011, Nucleic acids research.

[9]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[10]  Christine Fong,et al.  Bioinformatics Applications Note Genome Analysis Pgat: a Multistrain Analysis Resource for Microbial Genomes , 2022 .

[11]  B. Berger,et al.  Compressive genomics , 2012, Nature Biotechnology.

[12]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[13]  J. Blom,et al.  Pangenomic Study of Corynebacterium diphtheriae That Provides Insights into the Genomic Diversity of Pathogenic Isolates from Cases of Classical Diphtheria, Endocarditis, and Pneumonia , 2012, Journal of bacteriology.

[14]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[15]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.