Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads

Recent improvements in technology have made DNA sequencing dramatically faster and more efficient than ever before. The new technologies produce highly accurate sequences, but one drawback is that the most efficient technology produces the shortest read lengths. Short-read sequencing has been applied successfully to resequence the human genome and those of other species but not to whole-genome sequencing of novel organisms. Here we describe the sequencing and assembly of a novel clinical isolate of Pseudomonas aeruginosa, strain PAb1, using very short read technology. From 8,627,900 reads, each 33 nucleotides in length, we assembled the genome into one scaffold of 76 ordered contiguous sequences containing 6,290,005 nucleotides, including one contig spanning 512,638 nucleotides, plus an additional 436 unordered contigs containing 416,897 nucleotides. Our method includes a novel gene-boosting algorithm that uses amino acid sequences from predicted proteins to build a better assembly. This study demonstrates the feasibility of very short read sequencing for the sequencing of bacterial genomes, particularly those for which a related species has been sequenced previously, and expands the potential application of this new technology to most known prokaryotic species.

[1]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[2]  National Nosocomial Infections Surveillance (NNIS) System Report, data summary from January 1992 through June 2004, issued October 2004. , 2004, American journal of infection control.

[3]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[4]  Mark Gerstein,et al.  New insights into Acinetobacter baumannii pathogenesis revealed by high-density pyrosequencing and transposon mutagenesis. , 2007, Genes & development.

[5]  A. Halpern,et al.  A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[6]  S. Lory,et al.  The second messenger bis‐(3′‐5′)‐cyclic‐GMP and its PilZ domain‐containing receptor Alg44 are required for alginate biosynthesis in Pseudomonas aeruginosa , 2007, Molecular microbiology.

[7]  Steven Salzberg,et al.  Identifying bacterial genes and endosymbiont DNA with Glimmer , 2007, Bioinform..

[8]  Frederick M. Ausubel,et al.  Pseudomonas biofilm formation and antibiotic resistance are linked to phenotypic variation , 2002, Nature.

[9]  Nnis System National Nosocomial Infections Surveillance (NNIS) System Report, data summary from January 1992 through June 2003, issued August 2003. , 2003, American journal of infection control.

[11]  David Hernández,et al.  De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. , 2008, Genome research.

[12]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[13]  Michael J. MacCoss,et al.  Aminoglycoside antibiotics induce bacterial biofilm formation , 2005, Nature.

[14]  René L. Warren,et al.  Assembling millions of short DNA sequences using SSAKE , 2006, Bioinform..

[15]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[16]  Mihai Pop,et al.  Minimus: a fast, lightweight genome assembler , 2007, BMC Bioinformatics.

[17]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[18]  Mark J. P. Chaisson,et al.  Short read fragment assembly of bacterial genomes. , 2008, Genome research.

[19]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[20]  E. Mauceli,et al.  Whole-genome sequence assembly for mammalian genomes: Arachne 2. , 2003, Genome research.

[21]  Frederick M. Ausubel,et al.  Combining Genomic Tools to Dissect Multifactorial Virulence in Pseudomonas aeruginosa , 2008 .

[22]  Gabor T. Marth,et al.  Whole-genome sequencing and variant discovery in C. elegans , 2008, Nature Methods.

[23]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[24]  Niall J. Haslam,et al.  An analysis of the feasibility of short read sequencing , 2005, Nucleic acids research.

[25]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[26]  Bonnie L. Bassler,et al.  Bacterial Small-Molecule Signaling Pathways , 2006, Science.

[27]  Li Li,et al.  Genomic analysis reveals that Pseudomonas aeruginosa virulence is combinatorial , 2006, Genome Biology.

[28]  Stephen Lory,et al.  A four‐tiered transcriptional regulatory circuit controls flagellar biogenesis in Pseudomonas aeruginosa , 2003, Molecular microbiology.

[29]  M. Metzker Emerging technologies in DNA sequencing. , 2005, Genome research.

[30]  Adam M. Phillippy,et al.  Comparative genome assembly , 2004, Briefings Bioinform..

[31]  S. Lory,et al.  Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen , 2000, Nature.

[32]  C. Nusbaum,et al.  ALLPATHS: de novo assembly of whole-genome shotgun microreads. , 2008, Genome research.

[33]  Timothy R. Hughes,et al.  Analysis of Pseudomonas aeruginosa diguanylate cyclases and phosphodiesterases reveals a role for bis-( 3-5 )-cyclic-GMP in virulence , 2006 .

[34]  Yoshihiro Hayakawa,et al.  A cyclic-di-GMP receptor required for bacterial exopolysaccharide production , 2007, Molecular microbiology.

[35]  Dorit Amikam,et al.  Cyclic di-GMP as a second messenger. , 2006, Current opinion in microbiology.