Finding approximate gene clusters with Gecko 3

Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min.

[1]  S. Miyagishima,et al.  Identification of cyanobacterial cell division genes by comparative and mutational analyses , 2005, Molecular microbiology.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Ralf Steuer,et al.  Flux Balance Analysis of Cyanobacterial Metabolism: The Metabolic Network of Synechocystis sp. PCC 6803 , 2013, PLoS Comput. Biol..

[4]  K. Shinozaki,et al.  Plant Responses to Abiotic Stress , 2003, Topics in Current Genetics.

[5]  G. Walker,et al.  A Novel Sinorhizobium meliloti Operon Encodes an α-Glucosidase and a Periplasmic-Binding-Protein-Dependent Transport System for α-Glucosides , 1999 .

[6]  H. Fukuzawa,et al.  Distinct constitutive and low-CO2-induced CO2 uptake systems in cyanobacteria: Genes involved and their phylogenetic relationship with homologous genes in other organisms , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  M. Ikeuchi,et al.  Mutational analysis of genes involved in pilus structure, motility and transformation competency in the unicellular motile cyanobacterium Synechocystis sp. PCC 6803. , 2001, Plant & cell physiology.

[8]  Andrew H. Paterson,et al.  Synteny and Collinearity in Plant Genomes , 2008, Science.

[9]  Y. Saeys,et al.  Building genomic profiles for uncovering segmental homology in the twilight zone. , 2004, Genome research.

[10]  M. Kanehisa,et al.  Cold‐regulated genes under control of the cold sensor Hik33 in Synechocystis , 2001, Molecular microbiology.

[11]  Xin Chen,et al.  DOOR 2.0: presenting operons and their functions through dynamic and integrated views , 2013, Nucleic Acids Res..

[12]  Ilka M. Axmann,et al.  Biochemical analysis of three putative KaiC clock proteins from Synechocystis sp. PCC 6803 suggests their functional divergence. , 2013, Microbiology.

[13]  Jens Stoye,et al.  Algorithms for Finding Gene , 2001 .

[14]  B. Voß,et al.  Comparative Genome Analysis of the Closely Related Synechocystis Strains PCC 6714 and PCC 6803 , 2014, DNA research : an international journal for rapid publication of reports on genes and genomes.

[15]  Mathieu Raffinot,et al.  The Algorithmic of Gene Teams , 2002, WABI.

[16]  Frances D. Pitt,et al.  Functional Characterization of Synechocystis sp. Strain PCC 6803 pst1 and pst2 Gene Clusters Reveals a Novel Strategy for Phosphate Uptake in a Freshwater Cyanobacterium , 2010, Journal of bacteriology.

[17]  Frédéric Boyer,et al.  Syntons, metabolons and interactons: an exact graph-theoretical approach for exploring neighbourhood between genomic and functional data , 2005, Bioinform..

[18]  G. Peschek,et al.  The Phototrophic Prokaryotes , 1999, Springer US.

[19]  Joaquín Giner-Lamia,et al.  The CopRS Two-Component System Is Responsible for Resistance to Copper in the Cyanobacterium Synechocystis sp. PCC 68031[C][W][OA] , 2012, Plant Physiology.

[20]  T. Ellingsen,et al.  Analysis and Manipulation of Aspartate Pathway Genes for l-Lysine Overproduction from Methanol by Bacillus methanolicus , 2011, Applied and Environmental Microbiology.

[21]  Xin He,et al.  Detecting gene clusters under evolutionary constraint in a large number of genomes , 2009, Bioinform..

[22]  Jens Stoye,et al.  Gecko and GhostFam: rigorous and efficient gene cluster detection in prokaryotic genomes. , 2007, Methods in molecular biology.

[23]  Xin He,et al.  Efficiently Identifying Max-Gap Clusters in Pairwise Genome Comparison , 2008, J. Comput. Biol..

[24]  Anna Zorina,et al.  Stress Sensors and Signal Transducers in Cyanobacteria , 2010, Sensors.

[25]  Takeaki Uno,et al.  Fast Algorithms to Enumerate All Common Intervals of Two Permutations , 1997, Algorithmica.

[26]  J. Raes,et al.  The automatic detection of homologous regions (ADHoRe) and its application to microcolinearity between Arabidopsis and rice. , 2002, Genome research.

[27]  Frédéric Boyer,et al.  Bacterial syntenies: an exact approach with gene quorum , 2011, BMC Bioinformatics.

[28]  L. Sherman,et al.  Characterization of a stress-responsive operon in the cyanobacterium Synechocystis sp. strain PCC 6803. , 2002, Gene.

[29]  J. Risler,et al.  Identification of genomic features using microsyntenies of domains: domain teams. , 2005, Genome research.

[30]  Nobuyuki Takatani,et al.  Posttranslational Regulation of Nitrate Assimilation in the Cyanobacterium Synechocystis sp. Strain PCC 6803 , 2005, Journal of bacteriology.

[31]  E. Flores,et al.  The narA Locus ofSynechococcus sp. Strain PCC 7942 Consists of a Cluster of Molybdopterin Biosynthesis Genes , 1998, Journal of bacteriology.

[32]  V. Lyubetsky,et al.  Transcription Regulation of Plastid Genes Involved in Sulfate Transport in Viridiplantae , 2013, BioMed research international.

[33]  Gilles Didier,et al.  Common Intervals of Two Sequences , 2003, WABI.

[34]  M Sugita,et al.  Organization of a large gene cluster encoding ribosomal proteins in the cyanobacterium Synechococcus sp. strain PCC 6301: comparison of gene clusters among cyanobacteria, eubacteria and chloroplast genomes. , 1997, Gene.

[35]  A. Grossman,et al.  Genes Essential to Iron Transport in the Cyanobacterium Synechocystis sp. Strain PCC 6803 , 2001, Journal of bacteriology.

[36]  A. Ballal,et al.  The Kdp-ATPase system and its regulation , 2007, Journal of Biosciences.

[37]  B. Voß,et al.  Comparative Analysis of the Primary Transcriptome of Synechocystis sp. PCC 6803 , 2014, DNA research : an international journal for rapid publication of reports on genes and genomes.

[38]  Katharina Jahn Approximate common intervals based gene cluster models , 2011 .

[39]  A. Khodursky,et al.  Gene expression patterns of sulfur starvation in Synechocystis sp. PCC 6803 , 2008, BMC Genomics.

[40]  M. Ikeuchi,et al.  pilG Gene cluster and split pilL genes involved in pilus biogenesis, motility and genetic transformation in the cyanobacterium Synechocystis sp. PCC 6803. , 2002, Plant & cell physiology.

[41]  Jiong Yang,et al.  Gene teams with relaxed proximity constraint , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[42]  P. Kennelly,et al.  An Arsenate Reductase from Synechocystis sp. Strain PCC 6803 Exhibits a Novel Combination of Catalytic Characteristics , 2003, Journal of bacteriology.

[43]  Steven Salzberg,et al.  DAGchainer: a tool for mining segmental genome duplications and synteny , 2004, Bioinform..

[44]  Y. van de Peer,et al.  i-ADHoRe 3.0—fast and sensitive detection of genomic homology in extremely large data sets , 2011, Nucleic acids research.

[45]  A. Vioque,et al.  Cloning, purification and characterization of the protein subunit of ribonuclease P from the cyanobacterium Synechocystis sp. PCC 6803. , 1996, European journal of biochemistry.

[46]  R. Curtiss,et al.  Export of Extracellular Polysaccharides Modulates Adherence of the Cyanobacterium Synechocystis , 2013, PloS one.

[47]  Melissa J. Landrum,et al.  RefSeq: an update on mammalian reference sequences , 2013, Nucleic Acids Res..

[48]  Christoph Dieterich,et al.  Syntenator: Multiple gene order alignments with a gene-specific scoring function , 2008, Algorithms for Molecular Biology.

[49]  Kerstin Voigt,et al.  Gene Expansion Shapes Genome Architecture in the Human Pathogen Lichtheimia corymbifera: An Evolutionary Genomics Analysis in the Ancient Terrestrial Mucorales (Mucoromycotina) , 2014, PLoS genetics.

[50]  Jong-Seong Jeon,et al.  Near-UV cyanobacteriochrome signaling system elicits negative phototaxis in the cyanobacterium Synechocystis sp. PCC 6803 , 2011, Proceedings of the National Academy of Sciences.

[51]  M. Østerås,et al.  Identification and Transcriptional Control of the Genes Encoding the Caulobacter crescentus ClpXP Protease , 1999, Journal of bacteriology.

[52]  M. Ikeuchi,et al.  Gene Cluster and Split pilL Genes Involved in Pilus Biogenesis , Motility and Genetic Transformation in the Cyanobacterium Synechocystis sp . PCC 6803 , 2002 .

[53]  E. Koonin,et al.  Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context. , 2001, Genome research.

[54]  P. Bork,et al.  Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs , 2004, Nature Biotechnology.

[55]  Yan Zhang,et al.  Comparative genomic analyses of nickel, cobalt and vitamin B12 utilization , 2009, BMC Genomics.

[56]  R. Overbeek,et al.  The use of gene clusters to infer functional coupling. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[58]  Glenn Tesler,et al.  GRIMM: genome rearrangements web server , 2002, Bioinform..

[59]  K. Niyogi,et al.  Phylogenomic analysis of the Chlamydomonas genome unmasks proteins potentially involved in photosynthetic function and regulation , 2010, Photosynthesis Research.

[60]  K. K. I. U. Arunakumara,et al.  Optimum conditions for transformation of Synechocystis sp. PCC 6803. , 2007, Journal of microbiology.

[61]  S. Shestakov,et al.  Transformation in the cyanobacterium Synechocystis sp. 6803 , 1982 .

[62]  Dorothea Emig,et al.  Partitioning biological data with transitivity clustering , 2010, Nature Methods.

[63]  J. Bové,et al.  The tufB-secE-nusG-rplKAJL-rpoB gene cluster of the liberibacters: sequence comparisons, phylogeny and speciation. , 2008, International journal of systematic and evolutionary microbiology.

[64]  Todd J. Vision,et al.  Fast identification and statistical evaluation of segmental homologies in comparative maps , 2003, ISMB.

[65]  S. Berger,et al.  Cyanobacteria contain a mitochrondrial complex I‐homologous NADH‐dehydrogenase , 1991, FEBS letters.

[66]  Michael Y. Galperin,et al.  Prokaryotic genomes: the emerging paradigm of genome-based microbiology. , 1997, Current opinion in genetics & development.

[67]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[68]  Jens Stoye,et al.  Algorithms for Finding Gene Clusters , 2001, WABI.

[69]  Anthony D. Kappell,et al.  Regulation of the Cyanobacterial CO2-Concentrating Mechanism Involves Internal Sensing of NADP+ and α-Ketogutarate Levels by Transcription Factor CcmR , 2012, PloS one.

[70]  V. Capuano,et al.  The "anchor polypeptide" of cyanobacterial phycobilisomes. Molecular characterization of the Synechococcus sp. PCC 6301 apce gene. , 1991, The Journal of biological chemistry.

[71]  Jens Stoye,et al.  Statistics for approximate gene clusters , 2013, BMC Bioinformatics.

[72]  D. Bryant,et al.  The sufR Gene (sll0088 in Synechocystis sp. Strain PCC 6803) Functions as a Repressor of the sufBCDS Operon in Iron-Sulfur Cluster Biogenesis in Cyanobacteria , 2004, Journal of bacteriology.

[73]  José G García-Cerdán,et al.  A Conserved Rubredoxin Is Necessary for Photosystem II Accumulation in Diverse Oxygenic Photoautotrophs* , 2013, The Journal of Biological Chemistry.

[74]  W. Lockau,et al.  The Metabolic Network of Synechocystis sp. PCC 6803: Systemic Properties of Autotrophic Growth1[C][W] , 2010, Plant Physiology.

[75]  Katharina Jahn Efficient Computation of Approximate Gene Clusters Based on Reference Occurrences , 2011, J. Comput. Biol..

[76]  Daniel Doerr,et al.  Identifying gene clusters by discovering common intervals in indeterminate strings , 2014, BMC Genomics.

[77]  Amit U. Sinha,et al.  Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms , 2007, BMC Bioinformatics.

[78]  Xin He,et al.  Identifying Conserved Gene Clusters in the Presence of Homology Families , 2005, J. Comput. Biol..

[79]  Himadri B. Pakrasi,et al.  Membrane Topology of MntB, the Transmembrane Protein Component of an ABC Transporter System for Manganese in the CyanobacteriumSynechocystis sp. Strain PCC 6803 , 1999, Journal of bacteriology.

[80]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[81]  C. Dieterich,et al.  CYNTENATOR: Progressive Gene Order Alignment of 17 Vertebrate Genomes , 2010, PloS one.

[82]  Jens Stoye,et al.  Computation of Median Gene Clusters , 2009, J. Comput. Biol..

[83]  G. Cannon,et al.  Carboxysome genomics: a status report. , 2002, Functional plant biology : FPB.

[84]  Thorsten Bischler,et al.  Transcript mapping based on dRNA-seq data , 2014, BMC Bioinformatics.

[85]  S. Karlin,et al.  Highly expressed and alien genes of the Synechocystis genome. , 2001, Nucleic acids research.

[86]  Jens Stoye,et al.  Computation of Median Gene Clusters , 2008, RECOMB.

[87]  Zhe Li,et al.  Statistical inference of chromosomal homology based on gene colinearity and applications to Arabidopsis and rice , 2006, BMC Bioinformatics.

[88]  K. Forchhammer,et al.  Requirement of the Nitrogen Starvation-Induced Protein Sll0783 for Polyhydroxybutyrate Accumulation in Synechocystis sp. Strain PCC 6803 , 2010, Applied and Environmental Microbiology.

[89]  Damian Szklarczyk,et al.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration , 2012, Nucleic Acids Res..

[90]  Nobuyuki Takatani,et al.  Role of NtcB in Activation of Nitrate Assimilation Genes in the Cyanobacterium Synechocystis sp. Strain PCC 6803 , 2001, Journal of bacteriology.

[91]  Cloning and transcription analysis of the ndh(A-I-G-E) gene cluster and the ndhD gene of the cyanobacterium Synechocystis sp. PCC6803 , 1992, Plant Molecular Biology.