ARBitrator: a software pipeline for on-demand retrieval of auto-curated nifH sequences from GenBank

MOTIVATION Studies of the biochemical functions and activities of uncultivated microorganisms in the environment require analysis of DNA sequences for phylogenetic characterization and for the development of sequence-based assays for the detection of microorganisms. The numbers of sequences for genes that are indicators of environmentally important functions such as nitrogen (N2) fixation have been rapidly growing over the past few decades. Obtaining these sequences from the National Center for Biotechnology Information's GenBank database is problematic because of annotation errors, nomenclature variation and paralogues; moreover, GenBank's structure and tools are not conducive to searching solely by function. For some genes, such as the nifH gene commonly used to assess community potential for N2 fixation, manual collection and curation are becoming intractable because of the large number of sequences in GenBank and the large number of highly similar paralogues. If analysis is to keep pace with sequence discovery, an automated retrieval and curation system is necessary. RESULTS ARBitrator uses a two-step process composed of a broad collection of potential homologues followed by screening with a best hit strategy to conserved domains. 34 420 nifH sequences were identified in GenBank as of November 20, 2012. The false-positive rate is ∼0.033%. ARBitrator rapidly updates a public nifH sequence database, and we show that it can be adapted for other genes. AVAILABILITY AND IMPLEMENTATION Java source and executable code are freely available to non-commercial users at http://pmc.ucsc.edu/∼wwwzehr/research/database/. CONTACT zehrj@ucsc.edu SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION is available at Bioinformatics online.

[1]  L. Watson,et al.  Molecular phylogeny of the heterocystous cyanobacteria (subsections IV and V) based on nifD. , 2004, International journal of systematic and evolutionary microbiology.

[2]  Joshua M. Stuart,et al.  Misannotations of rRNA can now generate 90% false positive protein matches in metatranscriptomic studies , 2011, Nucleic acids research.

[3]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[4]  John Christian Gaby,et al.  A global census of nitrogenase diversity. , 2011, Environmental microbiology.

[5]  T Ueda,et al.  Remarkable N2-fixing bacterial diversity detected in rice roots by molecular evolutionary analysis of nifH gene sequences , 1995, Journal of bacteriology.

[6]  Narmada Thanki,et al.  CDD: a Conserved Domain Database for the functional annotation of proteins , 2010, Nucleic Acids Res..

[7]  J. Zehr,et al.  Use of degenerate oligonucleotides for amplification of the nifH gene from the marine cyanobacterium Trichodesmium thiebautii , 1989, Applied and environmental microbiology.

[8]  G. Roberts,et al.  Biological nitrogen fixation. , 1993, Annual review of nutrition.

[9]  Dan Wu,et al.  EMBL Nucleotide Sequence Database in 2006 , 2006, Nucleic Acids Res..

[10]  D. Capone,et al.  Problems and promises of assaying the genetic potential for nitrogen fixation in the marine environment , 1996, Microbial Ecology.

[11]  D. Haussler,et al.  Protein modeling using hidden Markov models: analysis of globins , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[12]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[13]  J. Zehr,et al.  Phylogeny of cyanobacterial nifH genes: evolutionary implications and potential applications to natural assemblages. , 1997, Microbiology.

[14]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..

[15]  Daniel H. Buckley,et al.  A comprehensive aligned nifH gene database: a multipurpose tool for studies of nitrogen-fixing bacteria , 2014, Database J. Biol. Databases Curation.

[16]  S. Zinder,et al.  Cloning, DNA sequencing, and characterization of a nifD-homologous gene from the archaeon Methanosarcina barkeri 227 which resembles nifD1 from the eubacterium Clostridium pasteurianum , 1994, Journal of bacteriology.

[17]  Hans W. Paerl,et al.  Consortial N2 fixation: a strategy for meeting nitrogen requirements of marine and terrestrial cyanobacterial mats , 1996 .

[18]  R Usami,et al.  Diversity of Nitrogen Fixation Genes in the Symbiotic Intestinal Microflora of the Termite Reticulitermes speratus , 1996, Applied and environmental microbiology.

[19]  K. Schleifer,et al.  ARB: a software environment for sequence data. , 2004, Nucleic acids research.

[20]  G. Roberts,et al.  Biosynthesis of the iron-molybdenum cofactor of nitrogenase. , 2008, Annual review of microbiology.

[21]  R. Palacios,et al.  Genomes and genomics of nitrogen-fixing organisms , 2005 .

[22]  Jonathan P Zehr,et al.  Nitrogenase gene diversity and microbial community structure: a cross-system comparison. , 2003, Environmental microbiology.

[23]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[24]  Derek R Lovley,et al.  Comparison of 16S rRNA, nifD, recA, gyrB, rpoB and fusA genes within the family Geobacteraceae fam. nov. , 2004, International journal of systematic and evolutionary microbiology.

[25]  J. Young,et al.  The Phylogeny and Evolution of Nitrogenases , 2005 .

[26]  Michael J. Friez,et al.  Recovery and Phylogenetic Analysis ofnifH Sequences from Diazotrophic Bacteria Associated with Dead Aboveground Biomass of Spartina alterniflora , 2001, Applied and Environmental Microbiology.

[27]  John W. Peters,et al.  An Alternative Path for the Evolution of Biological Nitrogen Fixation , 2011, Front. Microbio..

[28]  Lance C Seefeldt,et al.  Nitrogen Fixation: The Mechanism of the Mo-Dependent Nitrogenase , 2003, Critical reviews in biochemistry and molecular biology.

[29]  G. Horgan,et al.  Relative expression software tool (REST©) for group-wise comparison and statistical analysis of relative expression results in real-time PCR , 2002 .

[30]  Pietro Liò,et al.  Molecular Evolution of Nitrogen Fixation: The Evolutionary History of the nifD, nifK, nifE, and nifN Genes , 2000, Journal of Molecular Evolution.

[31]  M. Jacobson,et al.  Two nifA-like genes required for expression of alternative nitrogenases by Azotobacter vinelandii , 1989, Journal of bacteriology.

[32]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[33]  H. Paerl,et al.  Diversity of heterotrophic nitrogen fixation genes in a marine cyanobacterial mat , 1995, Applied and environmental microbiology.

[34]  L J Lehman,et al.  Identification of an alternative nitrogenase system in Rhodospirillum rubrum , 1991, Journal of bacteriology.

[35]  Harald Meier,et al.  46. ARB: A Software Environment for Sequence Data , 2011 .

[36]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[37]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[38]  J. Burdon,et al.  Conflicting phylogeographic patterns in rRNA and nifD indicate regionally restricted gene transfer in Bradyrhizobium. , 2002, Microbiology.

[39]  Jason Raymond,et al.  The natural history of nitrogen fixation. , 2004, Molecular biology and evolution.

[40]  David L. Wheeler,et al.  GenBank: update , 2004, Nucleic Acids Res..

[41]  Benjamin A. Shoemaker,et al.  CDD: a database of conserved domain alignments with links to domain three-dimensional structure , 2002, Nucleic Acids Res..

[42]  Jonathan P. Zehr,et al.  New Nitrogen-Fixing Microorganisms Detected in Oligotrophic Oceans by Amplification of Nitrogenase (nifH) Genes , 1998, Applied and Environmental Microbiology.

[43]  P Simonet,et al.  Frankia genus-specific characterization by polymerase chain reaction , 1991, Applied and environmental microbiology.

[44]  Susana Rodríguez-Echeverría,et al.  Rhizobial hitchhikers from Down Under: invasional meltdown in a plant–bacteria mutualism? , 2010 .

[45]  Werner Liesack,et al.  NifH and NifD phylogenies: an evolutionary basis for understanding nitrogen fixation capabilities of methanotrophic bacteria. , 2004, Microbiology.

[46]  Jason Raymond,et al.  Expression and Association of Group IV Nitrogenase NifD and NifH Homologs in the Non-Nitrogen-Fixing Archaeon Methanocaldococcus jannaschii , 2007, Journal of bacteriology.