JANE: efficient mapping of prokaryotic ESTs and variable length sequence reads on related template genomes

BackgroundESTs or variable sequence reads can be available in prokaryotic studies well before a complete genome is known. Use cases include (i) transcriptome studies or (ii) single cell sequencing of bacteria. Without suitable software their further analysis and mapping would have to await finalization of the corresponding genome.ResultsThe tool JANE rapidly maps ESTs or variable sequence reads in prokaryotic sequencing and transcriptome efforts to related template genomes. It provides an easy-to-use graphics interface for information retrieval and a toolkit for EST or nucleotide sequence function prediction. Furthermore, we developed for rapid mapping an enhanced sequence alignment algorithm which reassembles and evaluates high scoring pairs provided from the BLAST algorithm. Rapid assembly on and replacement of the template genome by sequence reads or mapped ESTs is achieved. This is illustrated (i) by data from Staphylococci as well as from a Blattabacteria sequencing effort, (ii) mapping single cell sequencing reads is shown for poribacteria to sister phylum representative Rhodopirellula Baltica SH1. The algorithm has been implemented in a web-server accessible at http://jane.bioapps.biozentrum.uni-wuerzburg.de.ConclusionRapid prokaryotic EST mapping or mapping of sequence reads is achieved applying JANE even without knowing the cognate genome sequence.

[1]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[2]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[3]  R. Durbin,et al.  GeneWise and Genomewise. , 2004, Genome research.

[4]  Rick L. Stevens,et al.  The RAST Server: Rapid Annotations using Subsystems Technology , 2008, BMC Genomics.

[5]  I-Min A. Chen,et al.  IMG/M: a data management and analysis system for metagenomes , 2007, Nucleic Acids Res..

[6]  Michael Q. Zhang,et al.  Using quality scores and longer reads improves accuracy of Solexa read mapping , 2008, BMC Bioinformatics.

[7]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[8]  Thomas Dandekar,et al.  inGeno – an integrated genome and ortholog viewer for improved genome to genome comparisons , 2006, BMC Bioinformatics.

[9]  J. Ruiz,et al.  Brazilian genome sequencing projects: state of the art. , 2008, Recent patents on DNA & gene sequences.

[10]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[11]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[12]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[13]  H. Hakonarson,et al.  Genomic Landscape of a Three-Generation Pedigree Segregating Affective Disorder , 2009, PloS one.

[14]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[15]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[16]  Eric Gaidos,et al.  An oligarchic microbial assemblage in the anoxic bottom waters of a volcanic subglacial lake , 2009, The ISME Journal.

[17]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[18]  R. Lasken,et al.  Genomic DNA Amplification from a Single Bacterium , 2005, Applied and Environmental Microbiology.

[19]  Rasmus Wernersson,et al.  Virtual Ribosome—a comprehensive DNA translation tool with support for integration of sequence feature annotation , 2006, Nucleic Acids Res..

[20]  Burkhard Morgenstern,et al.  DIALIGN: multiple DNA and protein sequence alignment at BiBiServ , 2004, Nucleic Acids Res..

[21]  Wing Hung Wong,et al.  SeqMap: mapping massive amount of oligonucleotides to the genome , 2008, Bioinform..

[22]  Richard Mott,et al.  EST_GENOME: a program to align spliced DNA sequences to unspliced genomic DNA , 1997, Comput. Appl. Biosci..

[23]  Benjamin P. Howden,et al.  Isolates with Low-Level Vancomycin Resistance Associated with Persistent Methicillin-Resistant Staphylococcus aureus Bacteremia , 2006, Antimicrobial Agents and Chemotherapy.

[24]  Don Gilbert,et al.  Sequence File Format Conversion with Command‐Line Readseq , 2003, Current protocols in bioinformatics.

[25]  R. Gross,et al.  Transcriptional profiling of the endosymbiont Blochmannia floridanus during different developmental stages of its holometabolous ant host. , 2009, Environmental microbiology.

[26]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information , 2021, Nucleic Acids Res..

[27]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[28]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[29]  Michael Kaufmann,et al.  DIALIGN-TX: greedy and progressive approaches for segment-based multiple sequence alignment , 2008, Algorithms for Molecular Biology.

[30]  J Craig Venter,et al.  Single-cell genomics , 2006, Nature Biotechnology.

[31]  N. Moran,et al.  Parallel genomic evolution and metabolic interdependence in an ancient symbiosis , 2007, Proceedings of the National Academy of Sciences.

[32]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[33]  Julian Parkhill,et al.  Single-cell genomics , 2008, Nature Reviews Microbiology.

[34]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[35]  A. Fodor,et al.  Molecular Diversity of a North Carolina Wastewater Treatment Plant as Revealed by Pyrosequencing , 2008, Applied and Environmental Microbiology.

[36]  S. Kravitz,et al.  CAMERA: A Community Resource for Metagenomics , 2007, PLoS biology.