Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies

Since its launch in 2004, the open-source AMOS project has released several innovative DNA sequence analysis applications including: Hawkeye, a visual analytics tool for inspecting the structure of genome assemblies; the Assembly Forensics and FRCurve pipelines for systematically evaluating the quality of a genome assembly; and AMOScmp, the first comparative genome assembler. These applications have been used to assemble and analyze dozens of genomes ranging in complexity from simple microbial species through mammalian genomes. Recent efforts have been focused on enhancing support for new data characteristics brought on by second- and now third-generation sequencing. This review describes the major components of AMOS in light of these challenges, with an emphasis on methods for assessing assembly quality and the visual analytics capabilities of Hawkeye. These interactive graphical aspects are essential for navigating and understanding the complexities of a genome assembly, from the overall genome structure down to individual bases. Hawkeye and AMOS are available open source at http://amos.sourceforge.net.

[1]  David R. Kelley,et al.  A whole-genome assembly of the domestic cow, Bos taurus , 2009, Genome Biology.

[2]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. japonica) , 2002, Science.

[3]  B. Mishra,et al.  Comparing De Novo Genome Assembly: The Long and Short of It , 2011, PloS one.

[4]  C. DeLisi,et al.  Phenotypic connections in surprising places , 2010, Genome Biology.

[5]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[6]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[7]  E. Mauceli,et al.  Whole-genome sequence assembly for mammalian genomes: Arachne 2. , 2003, Genome research.

[8]  M. Schatz,et al.  Genome assembly forensics: finding the elusive mis-assembly , 2008, Genome Biology.

[9]  Sergey Koren,et al.  Aggressive assembly of pyrosequencing reads with mates , 2008, Bioinform..

[10]  Mihai Pop,et al.  Minimus: a fast, lightweight genome assembler , 2007, BMC Bioinformatics.

[11]  Mihai Pop,et al.  Parametric Complexity of Sequence Assembly: Theory and Applications to Next Generation Sequencing , 2009, J. Comput. Biol..

[12]  Kerstin Jekosch,et al.  The zebrafish genome project: sequence analysis and annotation. , 2004, Methods in cell biology.

[13]  Mihai Pop,et al.  Bacillus anthracis comparative genome analysis in support of the Amerithrax investigation , 2011, Proceedings of the National Academy of Sciences.

[14]  David R. Kelley,et al.  Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.

[15]  Douglas R. Smith,et al.  Assembly reconciliation , 2008, Bioinform..

[16]  Ben Shneiderman,et al.  Hawkeye: an interactive visual analytics tool for genome assemblies , 2007, Genome Biology.

[17]  L. Hillier,et al.  PCAP: a whole-genome assembly program. , 2003, Genome research.

[18]  S. Salzberg,et al.  Hierarchical scaffolding with Bambus. , 2003, Genome research.

[19]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[20]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[21]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[22]  Stephen M. Mount,et al.  The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus) , 2008, Nature.

[23]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[24]  Paramvir S. Dehal,et al.  Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes , 2002, Science.

[25]  M. Schatz,et al.  Assembly of large genomes using second-generation sequencing. , 2010, Genome research.

[26]  Adam M. Phillippy,et al.  Comparative genome assembly , 2004, Briefings Bioinform..

[27]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[28]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[29]  Bud Mishra,et al.  Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons , 2011, Bioinform..

[30]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[31]  William Stafford Noble,et al.  Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry , 2008, ECCB.

[32]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[33]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[34]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .