A computational genomics pipeline for prokaryotic sequencing projects

Motivation: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. Results: We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. Availability and implementation: The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems. Contact: king.jordan@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  S. Turner,et al.  Real-time DNA sequencing from single polymerase molecules. , 2010, Methods in enzymology.

[2]  C. Hart,et al.  Meningococcal Disease , 1974, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[3]  Alexandre Lomsadze,et al.  Frameshift detection in prokaryotic genomic sequences , 2009, Int. J. Bioinform. Res. Appl..

[4]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[5]  M. Borodovsky,et al.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. , 2001, Nucleic acids research.

[6]  P. Langford,et al.  Natural genetic exchange between Haemophilus and Neisseria: intergeneric transfer of chromosomal genes between major human pathogens. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Jun Yu,et al.  VFDB 2008 release: an enhanced web-based resource for comparative pathogenomics , 2007, Nucleic Acids Res..

[8]  Andrew C. Stewart,et al.  DIYA: a bacterial annotation pipeline for any genomics lab , 2009, Bioinform..

[9]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[10]  Michael G. Kemp,et al.  The histone deacetylase inhibitor trichostatin A alters the pattern of DNA replication origin activity in human cells , 2005, Nucleic acids research.

[11]  Adam M. Phillippy,et al.  Comparative genome assembly , 2004, Briefings Bioinform..

[12]  D. Dubnau,et al.  DNA uptake during bacterial transformation , 2004, Nature Reviews Microbiology.

[13]  N. Mulder,et al.  InterPro and InterProScan: tools for protein sequence classification and comparison. , 2007, Methods in molecular biology.

[14]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[15]  S. Salzberg,et al.  Improved microbial gene identification with GLIMMER. , 1999, Nucleic acids research.

[16]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[17]  The UniProt Consortium,et al.  The Universal Protein Resource (UniProt) 2009 , 2008, Nucleic Acids Res..

[18]  A. Gnirke,et al.  ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads , 2009, Genome Biology.

[19]  A. Goesmann,et al.  Whole-genome comparison of disease and carriage strains provides insights into virulence evolution in Neisseria meningitidis , 2008, Proceedings of the National Academy of Sciences.

[20]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[21]  H. Tettelin,et al.  Comparative genomics of Neisseria meningitidis: core genome, islands of horizontal transfer and pathogen-specific genes. , 2006, Microbiology.

[22]  Daniel J. Wilson,et al.  The influence of mutation, recombination, population history, and selection on patterns of genetic diversity in Neisseria meningitidis. , 2005, Molecular biology and evolution.

[23]  E. Holmes,et al.  The influence of recombination on the population structure and evolution of the human pathogen Neisseria meningitidis. , 1999, Molecular biology and evolution.

[24]  Jun Yu,et al.  VFDB: a reference database for bacterial virulence factors , 2004, Nucleic Acids Res..

[25]  J. Shendure,et al.  Materials and Methods Som Text Figs. S1 and S2 Tables S1 to S4 References Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome , 2022 .

[26]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[27]  Rolf Apweiler,et al.  InterPro and InterProScan , 2007 .

[28]  M. Achtman,et al.  Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Sergey Koren,et al.  Aggressive assembly of pyrosequencing reads with mates , 2008, Bioinform..

[30]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[31]  B. Barrell,et al.  Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica , 2003, Nature Genetics.

[32]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[33]  I-Min A. Chen,et al.  The integrated microbial genomes system: an expanding comparative analysis resource , 2009, Nucleic Acids Res..

[34]  Lauren Ancel Meyers,et al.  Epidemiology, hypermutation, within–host evolution and the virulence of Neisseria meningitidis , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[35]  S. Kravitz,et al.  CAMERA: A Community Resource for Metagenomics , 2007, PLoS biology.

[36]  F. von Wintzingerode,et al.  Evolutionary trends in the genus Bordetella. , 2001, Microbes and infection.

[37]  Pascal Lapierre,et al.  Estimating the size of the bacterial pan-genome. , 2009, Trends in genetics : TIG.

[38]  Aaron E. Darling,et al.  Reordering contigs of draft genomes using the Mauve Aligner , 2009, Bioinform..

[39]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[40]  Mihai Pop,et al.  Minimus: a fast, lightweight genome assembler , 2007, BMC Bioinformatics.

[41]  F. Blattner,et al.  Mauve: multiple alignment of conserved genomic sequence with rearrangements. , 2004, Genome research.

[42]  P. Dessen,et al.  Comparative Genomics Identifies the Genetic Islands That Distinguish Neisseria meningitidis, the Agent of Cerebrospinal Meningitis, from Other Neisseria Species , 2002, Infection and Immunity.

[43]  Gabor T. Marth,et al.  Pyrobayes: an improved base caller for SNP discovery in pyrosequences , 2008, Nature Methods.

[44]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.