Comparative assembly hubs: Web-accessible browsers for comparative genomics

MOTIVATION Researchers now have access to large volumes of genome sequences for comparative analysis, some generated by the plethora of public sequencing projects and, increasingly, from individual efforts. It is not possible, or necessarily desirable, that the public genome browsers attempt to curate all these data. Instead, a wealth of powerful tools is emerging to empower users to create their own visualizations and browsers. RESULTS We introduce a pipeline to easily generate collections of Web-accessible UCSC Genome Browsers interrelated by an alignment. It is intended to democratize our comparative genomic browser resources, serving the broad and growing community of evolutionary genomicists and facilitating easy public sharing via the Internet. Using the alignment, all annotations and the alignment itself can be efficiently viewed with reference to any genome in the collection, symmetrically. A new, intelligently scaled alignment display makes it simple to view all changes between the genomes at all levels of resolution, from substitutions to complex structural rearrangements, including duplications. To demonstrate this work, we create a comparative assembly hub containing 57 Escherichia coli and 9 Shigella genomes and show examples that highlight their unique biology. AVAILABILITY AND IMPLEMENTATION The source code is available as open source at: https://github.com/glennhickey/progressiveCactus The E.coli and Shigella genome hub is now a public hub listed on the UCSC browser public hubs Web page.

[1]  K. Shanmugam,et al.  Genetic improvement of Escherichia coli for ethanol production: chromosomal integration of Zymomonas mobilis genes encoding pyruvate decarboxylase and alcohol dehydrogenase II , 1991, Applied and environmental microbiology.

[2]  Geoffrey J. Barton,et al.  Jalview Version 2—a multiple sequence alignment editor and analysis workbench , 2009, Bioinform..

[3]  Laura R. Jarboe,et al.  Optical mapping and sequencing of the Escherichia coli KO11 genome reveal extensive chromosomal rearrangements, and multiple tandem copies of the Zymomonas mobilispdc and adhB genes , 2012, Journal of Industrial Microbiology & Biotechnology.

[4]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[5]  Ulrich Dobrindt,et al.  E. coli as an all-rounder: the thin line between commensalism and pathogenicity. , 2013, Current topics in microbiology and immunology.

[6]  David Haussler,et al.  Building a Pangenome Reference for a Population , 2014, RECOMB.

[7]  N. Perna,et al.  progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement , 2010, PloS one.

[8]  Daniel J. Blankenberg,et al.  28-way vertebrate alignment and conservation track in the UCSC Genome Browser. , 2007, Genome research.

[9]  Ting Wang,et al.  Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser , 2013, Bioinform..

[10]  C W Hill,et al.  Inversions between ribosomal RNA genes of Escherichia coli. , 1981, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Jason W. Sahl,et al.  A Comparative Genomic Analysis of Diverse Clonal Types of Enterotoxigenic Escherichia coli Reveals Pathovar-Specific Conservation , 2010, Infection and Immunity.

[12]  Adam C. Siepel,et al.  PHAST and RPHAST: phylogenetic analysis with space/time models , 2011, Briefings Bioinform..

[13]  David Haussler,et al.  The UCSC Genome Browser database: 2014 update , 2013, Nucleic Acids Res..

[14]  Steven Salzberg,et al.  DAGchainer: a tool for mining segmental genome duplications and synteny , 2004, Bioinform..

[15]  I. Dubchak,et al.  Visualizing genomes: techniques and challenges , 2010, Nature Methods.

[16]  David Haussler,et al.  Comparative Genomics Search for Losses of Long-Established Genes on the Human Lineage , 2007, PLoS Comput. Biol..

[17]  Masahira Hattori,et al.  Comparative genomics reveal the mechanism of the parallel evolution of O157 and non-O157 enterohemorrhagic Escherichia coli , 2009, Proceedings of the National Academy of Sciences.

[18]  Kay Nieselt,et al.  GenomeRing: alignment visualization based on SuperGenome coordinates , 2012, Bioinform..

[19]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[20]  David Haussler,et al.  Cactus: Algorithms for genome multiple sequence alignment. , 2011, Genome research.

[21]  Mary Goldman,et al.  The UCSC Genome Browser database: extensions and updates 2013 , 2012, Nucleic Acids Res..

[22]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[23]  Koji Hayashi,et al.  Highly accurate genome sequences of Escherichia coli K-12 strains MG1655 and W3110 , 2006, Molecular Systems Biology.

[24]  David Haussler,et al.  HAL: a hierarchical format for storing and analyzing multiple genome alignments , 2013, Bioinform..

[25]  Lior Pachter,et al.  VISTA : visualizing global DNA sequence alignments of arbitrary length , 2000, Bioinform..

[26]  D. Ussery,et al.  Comparison of 61 Sequenced Escherichia coli Genomes , 2010, Microbial Ecology.

[27]  A. Danchin,et al.  Organised Genome Dynamics in the Escherichia coli Species Results in Highly Diverse Adaptive Paths , 2009, PLoS genetics.

[28]  Daniel J. Wilson,et al.  Transforming clinical microbiology with bacterial genome sequencing , 2012, Nature Reviews Genetics.