DOOR: a prokaryotic operon database for genome analyses and functional inference

The rapid accumulation of fully sequenced prokaryotic genomes provides unprecedented information for biological studies of bacterial and archaeal organisms in a systematic manner. Operons are the basic functional units for conducting such studies. Here, we review an operon database DOOR (the Database of prOkaryotic OpeRons) that we have previously developed and continue to update. Currently, the database contains 6 975 454 computationally predicted operons in 2072 complete genomes. In addition, the database also contains the following information: (i) transcriptional units for 24 genomes derived using publicly available transcriptomic data; (ii) orthologous gene mapping across genomes; (iii) 6408 cis-regulatory motifs for transcriptional factors of some operons for 203 genomes; (iv) 3 456 718 Rho-independent terminators for 2072 genomes; as well as (v) a suite of tools in support of applications of the predicted operons. In this review, we will explain how such data are computationally derived and demonstrate how they can be used to derive a wide range of higher-level information needed for systems biology studies to tackle complex and fundamental biology questions.

[1]  Qin Ma,et al.  Global Genomic Arrangement of Bacterial Genes Is Closely Tied with the Total Transcriptional Efficiency , 2013, Genom. Proteom. Bioinform..

[2]  Ying Xu,et al.  SeqTU: A Web Server for Identification of Bacterial Transcription Units , 2017, Scientific Reports.

[3]  Ying Xu,et al.  Computational analyses of transcriptomic data reveal the dynamic organization of the Escherichia coli chromosome under different conditions , 2013, Nucleic acids research.

[4]  Xin Chen,et al.  DMINDA 2.0: integrated and systematic views of regulatory DNA motif identification and analyses , 2017, Bioinform..

[5]  Susumu Goto,et al.  ODB: a database of operons accumulating known operons across multiple genomes , 2005, Nucleic Acids Res..

[6]  Ying Xu,et al.  Operon prediction using both genome-specific and general genomic information , 2006, Nucleic acids research.

[7]  Ying Xu,et al.  Prediction of functional modules based on comparative genome analysis and Gene Ontology application , 2005, Nucleic acids research.

[8]  Robert Hein,et al.  rrnDB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development , 2014, Nucleic Acids Res..

[9]  Ying Xu,et al.  QServer: A Biclustering Server for Prediction and Assessment of Co-Expressed Gene Clusters , 2012, PloS one.

[10]  Oscar P. Kuipers,et al.  The relative value of operon predictions , 2008, Briefings Bioinform..

[11]  Ying Xu,et al.  DOOR: a database for prokaryotic operons , 2008, Nucleic Acids Res..

[12]  Ying Xu,et al.  A new framework for identifying cis-regulatory motifs in prokaryotes , 2010, Nucleic acids research.

[13]  Ying Xu,et al.  Genomic arrangement of bacterial operons is constrained by biological pathways encoded in the genome , 2010, Proceedings of the National Academy of Sciences.

[14]  Ying Xu,et al.  Integration of sequence-similarity and functional association information can overcome intrinsic problems in orthology mapping across bacterial genomes , 2011, Nucleic acids research.

[15]  Ying Xu,et al.  QUBIC: a qualitative biclustering algorithm for analyses of gene expression data , 2009, Nucleic acids research.

[16]  Xin Chen,et al.  DOOR 2.0: presenting operons and their functions through dynamic and integrated views , 2013, Nucleic Acids Res..

[17]  R. Sorek,et al.  Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity , 2010, Nature Reviews Genetics.

[18]  Anthony A. Fodor,et al.  Microbial genomic analysis reveals the essential role of inflammation in bacteria-induced colorectal cancer , 2014, Nature Communications.

[19]  P Bork,et al.  Gene context conservation of a higher order than operons. , 2000, Trends in biochemical sciences.

[20]  Wen-Chi Chou,et al.  Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum , 2015, Nucleic acids research.

[21]  Temple F. Smith,et al.  Operons in Escherichia coli: genomic analyses and predictions. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Qin Ma,et al.  QUBIC: a bioconductor package for qualitative biclustering analysis of gene co-expression data. , 2016, Bioinformatics.

[23]  Steven Salzberg,et al.  OperonDB: a comprehensive database of predicted operons in microbial genomes , 2008, Nucleic Acids Res..

[24]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[25]  Hongwei Wu,et al.  Detecting uber-operons in prokaryotic genomes , 2006, Nucleic acids research.

[26]  Xin Chen,et al.  DMINDA: an integrated web server for DNA motif identification and analyses , 2014, Nucleic Acids Res..

[27]  Bingqiang Liu,et al.  An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes , 2016, BMC Genomics.

[28]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[29]  Jacques Monod,et al.  [The operon: a group of genes with expression coordinated by an operator. C.R.Acad. Sci. Paris 250 (1960) 1727-1729]. , 2005, Comptes rendus biologies.

[30]  I-Min A. Chen,et al.  IMG/M: integrated genome and metagenome comparative data analysis system , 2016, Nucleic Acids Res..

[31]  Ying Xu,et al.  Mapping of orthologous genes in the context of biological pathways: An application of integer programming , 2006, Proc. Natl. Acad. Sci. USA.

[32]  David Page,et al.  A Probabilistic Learning Approach to Whole-Genome Operon Prediction , 2000, ISMB.

[33]  Ying Xu,et al.  An integrated toolkit for accurate prediction and analysis of cis-regulatory motifs at a genome scale , 2013, Bioinform..

[34]  Wilfred W. Li,et al.  MEME: discovering and analyzing DNA and protein sequence motifs , 2006, Nucleic Acids Res..

[35]  Bingqiang Liu,et al.  Bacterial regulon modeling and prediction based on systematic cis regulatory motif analyses , 2016, Scientific Reports.

[36]  Enrique Merino,et al.  ProOpDB: Prokaryotic Operon DataBase , 2011, Nucleic Acids Res..

[37]  Ying Xu,et al.  Cubic: Identification of Regulatory Binding Sites through Data Clustering , 2003, J. Bioinform. Comput. Biol..

[38]  Ying Xu,et al.  Understanding the commonalities and differences in genomic organizations across closely related bacteria from an energy perspective , 2014, Science China Life Sciences.