INDIGO – INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles

Background The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. Results We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. Conclusions We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.

[1]  W. Michaelis,et al.  Hydrothermal petroleum generation in Red Sea sediments from the Kebrit and Shaban deeps , 1990 .

[2]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[3]  R. Huber,et al.  Isolation of a hyperthermophilic archaeum predicted by in situ RNA analysis , 1995, Nature.

[4]  C. Garrett,et al.  The shallow thermohaline circulation of the Red Sea , 1997 .

[5]  Gapped BLAST and PSI-BLAST: A new , 1997 .

[6]  G. Fox,et al.  Phylogenetic mapping of bacterial morphology. , 1998, Microbiology.

[7]  R. Huber,et al.  Towards the ecology of hyperthermophiles: biotopes, new isolation strategies and novel metabolic properties. , 2000, FEMS microbiology reviews.

[8]  J. Tamames,et al.  Bringing gene order into bacterial shape. , 2001, Trends in genetics : TIG.

[9]  M. Borodovsky,et al.  GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. , 2001, Nucleic acids research.

[10]  Robert Huber,et al.  Salinisphaerashabanensis gen. nov., sp. nov., a novel, moderately halophilic bacterium from the brine–seawater interface of the Shaban Deep, Red Sea , 2003, Extremophiles.

[11]  F. Ramsdell,et al.  An essential role for Scurfin in CD4+CD25+ T regulatory cells , 2003, Nature Immunology.

[12]  Carlos Alberto Heuser,et al.  Integrating Biological Databases , 2003, SBBD.

[13]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[14]  Miguel Vicente,et al.  Genomic channeling in bacterial cell division , 2004, Journal of molecular recognition : JMR.

[15]  Yi-Ping Phoebe Chen,et al.  Information Integration in Molecular Bioscience , 2005, Applied bioinformatics.

[16]  Peter Schattner,et al.  The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs , 2005, Nucleic Acids Res..

[17]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[18]  Mihai Pop,et al.  Minimus: a fast, lightweight genome assembler , 2007, BMC Bioinformatics.

[19]  Peter F. Hallin,et al.  RNAmmer: consistent and rapid annotation of ribosomal RNA genes , 2007, Nucleic acids research.

[20]  Victor M Markowitz,et al.  Microbial genome data resources. , 2007, Current opinion in biotechnology.

[21]  Claudine Médigue,et al.  Annotation, comparison and databases for hundreds of bacterial genomes. , 2007, Research in microbiology.

[22]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[23]  T. Itoh,et al.  MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[24]  Gerhard Wanner,et al.  A New Lineage of Halophilic, Wall-Less, Contractile Bacteria from a Brine-Filled Deep of the Red Sea , 2008, Journal of bacteriology.

[25]  Robert Huber,et al.  Halorhabdus tiamatea sp. nov., a non-pigmented, extremely halophilic archaeon from a deep-sea, hypersaline anoxic basin of the Red Sea, and emended description of the genus Halorhabdus. , 2008, International journal of systematic and evolutionary microbiology.

[26]  S. Salzberg,et al.  Bioinformatics challenges of new sequencing technology. , 2008, Trends in genetics : TIG.

[27]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[28]  Jonathan D. G. Jones,et al.  Application of 'next-generation' sequencing technologies to microbial genetics , 2009, Nature Reviews Microbiology.

[29]  J. García,et al.  Anaerobic Catabolism of Aromatic Compounds: a Genetic and Genomic View , 2009, Microbiology and Molecular Biology Reviews.

[30]  L. Stein,et al.  JBrowse: a next-generation genome browser. , 2009, Genome research.

[31]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[32]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[33]  B. Haas,et al.  A Catalog of Reference Genomes from the Human Microbiome , 2010, Science.

[34]  Peter D. Karp,et al.  Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology , 2015, Briefings Bioinform..

[35]  Inna Dubchak,et al.  MicrobesOnline: an integrated portal for comparative and functional genomics , 2009, Nucleic Acids Res..

[36]  Vladimir B. Bajic,et al.  Genome Sequence of Salinisphaera shabanensis, a Gammaproteobacterium from the Harsh, Variable Environment of the Brine-Seawater Interface of the Shaban Deep in the Red Sea , 2011, Journal of bacteriology.

[37]  Vladimir B. Bajic,et al.  Genome Sequence of Halorhabdus tiamatea, the First Archaeon Isolated from a Deep-Sea Anoxic Brine Lake , 2011, Journal of bacteriology.

[38]  Junhua Li,et al.  Open-source genomic analysis of Shiga-toxin-producing E. coli O104:H4. , 2011, The New England journal of medicine.

[39]  Vladimir B. Bajic,et al.  Genome Sequence of Haloplasma contractile, an Unusual Contractile Bacterium from a Deep-Sea Anoxic Brine Lake , 2011, Journal of bacteriology.

[40]  Thomas Triplet,et al.  Systems biology warehousing: challenges and strategies toward effective data integration , 2011 .

[41]  Junjun Zhang,et al.  BioMart: a data federation framework for large collaborative projects , 2011, Database J. Biol. Databases Curation.

[42]  U. Stingl,et al.  Microbiology of the Red Sea (and other) deep-sea anoxic brine lakes. , 2011, Environmental microbiology reports.

[43]  Adam M. Phillippy,et al.  Interactive metagenomic visualization in a Web browser , 2011, BMC Bioinformatics.

[44]  Walter Pirovano,et al.  BIOINFORMATICS APPLICATIONS , 2022 .

[45]  Laurie Goodman,et al.  Large and linked in scientific publishing , 2012, GigaScience.

[46]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..

[47]  Sergio Contrino,et al.  InterMine: a flexible data warehouse system for the integration and analysis of heterogeneous biological data , 2012, Bioinform..

[48]  Peter Williams,et al.  IMG: the integrated microbial genomes database and comparative analysis system , 2011, Nucleic Acids Res..

[49]  Ryan T Fuchs,et al.  Structural bias in T4 RNA ligase-mediated 3′-adapter ligation , 2012, Nucleic acids research.

[50]  J. Silberg,et al.  A transposase strategy for creating libraries of circularly permuted proteins , 2012, Nucleic acids research.

[51]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[52]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[53]  J. García,et al.  Bacterial Degradation of Benzoate , 2012, The Journal of Biological Chemistry.

[54]  André Antunes,et al.  Biogeography of pelagic bacterioplankton across an antagonistic temperature–salinity gradient in the Red Sea , 2012, Molecular ecology.

[55]  Per Sunnerhagen,et al.  Plasma exosomes can deliver exogenous short interfering RNA to monocytes and lymphocytes , 2012, Nucleic acids research.

[56]  Orkun S. Soyer,et al.  The roles of integration in molecular systems biology. , 2012, Studies in history and philosophy of biological and biomedical sciences.

[57]  W. Pirovano,et al.  Toward almost closed genomes with GapFiller , 2012, Genome Biology.

[58]  The UniProt Consortium,et al.  Reorganizing the protein space at the Universal Protein Resource (UniProt) , 2011, Nucleic Acids Res..

[59]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[60]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[61]  B. Faircloth,et al.  Primer3—new capabilities and interfaces , 2012, Nucleic acids research.

[62]  Moriya Ohkuma,et al.  Defensive Bacteriome Symbiont with a Drastically Reduced Genome , 2013, Current Biology.

[63]  Mick Watson,et al.  The automatic annotation of bacterial genomes , 2012, Briefings Bioinform..

[64]  Narmada Thanki,et al.  CDD: conserved domains and protein three-dimensional structure , 2012, Nucleic Acids Res..

[65]  Pablo Pareja-Tobes,et al.  BG7: A New Approach for Bacterial Genome Annotation Designed for Next Generation Sequencing Data , 2012, IWBBIO.

[66]  Stefan Engelen,et al.  MicroScope—an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data , 2012, Nucleic Acids Res..

[67]  Yu-Chieh Liao,et al.  CISA: Contig Integrator for Sequence Assembly of Bacterial Genomes , 2013, PloS one.

[68]  Shuo Lin,et al.  Genomic deletion induced by Tol2 transposon excision in zebrafish , 2012, Nucleic acids research.

[69]  Gregory Butler,et al.  A review of genomic data warehousing systems , 2014, Briefings Bioinform..