Finding the missing honey bee genes: lessons learned from a genome upgrade

BackgroundThe first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes.ResultsHere, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data.ConclusionsLessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.

Dan Graur | Mario Stanke | Monica C Munoz-Torres | Eran Elhaik | Martin Beye | Roderic Guigo | Victor Solovyev | Eckart Stolle | Radhika S. Khetani | Christine G Elsik | Matthew E Hudson | Christopher P Childers | Matthias Van Vaerenbergh | Olav Rueppell | Vandita Joshi | Dianhui Zhu | Kim C Worley | Anna K Bennett | Francisco Camara | Dirk C de Graaf | Griet Debyser | Jixin Deng | Bart Devreese | Jay D Evans | Leonard J Foster | Katharina Jasmin Hoff | Michael E Holder | Greg J Hunt | Huaiyang Jiang | Radhika S Khetani | Peter Kosarev | Christie L Kovar | Jian Ma | Ryszard Maleszka | Robin F A Moritz | Terence D Murphy | Donna M Muzny | Irene F Newsham | Justin T Reese | Hugh M Robertson | Gene E Robinson | Jennifer M Tsuruda | Robert M Waterhouse | Daniel B Weaver | Charles W Whitfield | Yuanqing Wu | Evgeny M Zdobnov | Lan Zhang | Richard A Gibbs | R. Guigó | R. Gibbs | G. Robinson | D. Muzny | C. Kovar | V. Solovyev | K. Worley | Huaiyang Jiang | C. Elsik | Jian Ma | L. Foster | E. Zdobnov | F. Camara | I. Newsham | Yuanqing Wu | O. Rueppell | M. Holder | M. Stanke | J. Reese | P. Kosarev | D. Graur | B. Devreese | Lan Zhang | C. Whitfield | T. Murphy | E. Elhaik | R. Maleszka | R. Waterhouse | H. Robertson | D. Weaver | M. Beye | M. Munoz-Torres | C. Childers | D. D. de Graaf | G. Hunt | M. Hudson | R. Moritz | Eckart Stolle | Jennifer M. Tsuruda | K. Hoff | G. Debyser | A. K. Bennett | Daniel B. Weaver | J. Evans | M. Vaerenbergh | Vandita Joshi | Jixin Deng | Dianhui Zhu | D. Muzny | M. V. Vaerenbergh

[1]  Tim Hubbard Finishing the euchromatic sequence of the human genome , 2004 .

[2]  Anders Krogh,et al.  farming suggests key adaptations to advanced social life and fungus Acromyrmex echinatior The genome of the leaf-cutting ant Material Supplemental , 2011 .

[3]  Jun Wang,et al.  Genomic Comparison of the Ants Camponotus floridanus and Harpegnathos saltator , 2010, Science.

[4]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[5]  Jason J. Corneveaux,et al.  Genome reannotation of the lizard Anolis carolinensis based on 14 adult and embryonic deep transcriptomes , 2013, BMC Genomics.

[6]  G. Robinson,et al.  Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee. , 2002, Genome research.

[7]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[8]  Christine G. Elsik,et al.  RNA interference knockdown of DNA methyl-transferase 3 affects gene alternative splicing in the honey bee , 2013, Proceedings of the National Academy of Sciences.

[9]  Hugh M Robertson,et al.  The chemoreceptor superfamily in the honey bee, Apis mellifera: expansion of the odorant, but not gustatory, receptor family. , 2006, Genome research.

[10]  Erich Bornberg-Bauer,et al.  Social insect genomes exhibit dramatic evolution in gene composition and regulation while preserving regulatory features linked to sociality , 2013, Genome research.

[11]  G. Kreil,et al.  The Precursors of the Bee Venom Constituents Apamin and MCD Peptide Are Encoded by Two Genes in Tandem Which Share the Same 3′-Exon (*) , 1995, The Journal of Biological Chemistry.

[12]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[13]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[14]  D. vanEngelsdorp,et al.  Pathogen Webs in Collapsing Honey Bee Colonies , 2012, PloS one.

[15]  Amos Bairoch,et al.  PROSITE, a protein domain database for functional characterization and annotation , 2009, Nucleic Acids Res..

[16]  Monica C Munoz-Torres,et al.  Web Apollo: a web-based genomic annotation editing platform , 2013, Genome Biology.

[17]  Leonard J. Foster,et al.  Mechanisms of stable lipid loss in a social insect , 2011, Journal of Experimental Biology.

[18]  C. Feschotte Transposable elements and the evolution of regulatory networks , 2008, Nature Reviews Genetics.

[19]  J. Settele,et al.  Economic valuation of the vulnerability of world agriculture confronted with pollinator decline , 2009 .

[20]  G. Robinson,et al.  DNA methylation dynamics, metabolic fluxes, gene splicing, and alternative phenotypes in honey bees , 2012, Proceedings of the National Academy of Sciences.

[21]  C. Kent,et al.  Recombination is associated with the evolution of genome structure and worker behavior in honey bees , 2012, Proceedings of the National Academy of Sciences.

[22]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[23]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[24]  Jerzy Jurka,et al.  Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor , 2006, BMC Bioinformatics.

[25]  Leonard J Foster,et al.  Quantitative Comparison of Caste Differences in Honeybee Hemolymph*S , 2006, Molecular & Cellular Proteomics.

[26]  B. Devreese,et al.  Exploring the hidden honeybee (Apis mellifera) venom proteome by integrating a combinatorial peptide ligand library approach with FTMS. , 2014, Journal of proteomics.

[27]  Suzanna Lewis,et al.  Apollo: a community resource for genome annotation editing , 2009, Bioinform..

[28]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[29]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[30]  R. E. Page,et al.  Exceptionally high levels of recombination across the honey bee genome. , 2006, Genome research.

[31]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[32]  Keith Bradnam,et al.  Assessing the gene space in draft genomes , 2008, Nucleic acids research.

[33]  Brian R. Johnson,et al.  Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile) , 2011, Proceedings of the National Academy of Sciences.

[34]  S. Salzberg,et al.  Genome Assembly Has a Major Impact on Gene Content: A Comparison of Annotation in Two Bos Taurus Assemblies , 2011, PloS one.

[35]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[36]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[37]  Robert S. Ledley,et al.  PIRSF: family classification system at the Protein Information Resource , 2004, Nucleic Acids Res..

[38]  J. Jurka,et al.  Simple and fast classification of non-LTR retrotransposons based on phylogeny of their RT domain protein sequences. , 2009, Gene.

[39]  Sue A. Olson,et al.  Emboss opens up sequence analysis , 2002, Briefings Bioinform..

[40]  J. Dvorak,et al.  Gene Space Dynamics During the Evolution of Aegilops tauschii, Brachypodium distachyon, Oryza sativa, and Sorghum bicolor Genomes , 2011, Molecular biology and evolution.

[41]  S. Wessler,et al.  The catalytic domain of all eukaryotic cut-and-paste transposase superfamilies , 2011, Proceedings of the National Academy of Sciences.

[42]  Erich Bornberg-Bauer,et al.  Functional and Evolutionary Insights from the Genomes of Three Parasitoid Nasonia Species , 2010, Science.

[43]  Peer Bork,et al.  SMART 7: recent updates to the protein domain annotation resource , 2011, Nucleic Acids Res..

[44]  Paul C. Leyland,et al.  FlyBase: improvements to the bibliography , 2012, Nucleic Acids Res..

[45]  Daniel L. Hartl,et al.  GeneMerge - Post-genomic Analysis, Data Mining, and Hypothesis Testing , 2003, Bioinform..

[46]  Sue A. Olson,et al.  EMBOSS opens up sequence analysis. European Molecular Biology Open Software Suite. , 2002, Briefings in bioinformatics.

[47]  D. Witherspoon,et al.  Recent horizontal transfer of mellifera subfamily mariner transposons into insect lineages representing four different orders shows that selection acts only during horizontal transfer. , 2003, Molecular biology and evolution.

[48]  Robert D. Finn,et al.  InterPro in 2011: new developments in the family and domain prediction database , 2011, Nucleic acids research.

[49]  Susan J. Brown,et al.  The i5K Initiative: advancing arthropod genomics for knowledge, human health, agriculture, and the environment. , 2013, The Journal of heredity.

[50]  R. Guigó,et al.  GeneID in Drosophila. , 2000, Genome research.

[51]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[52]  L. Keller,et al.  The genome of the fire ant Solenopsis invicta , 2011, Proceedings of the National Academy of Sciences.

[53]  Leonard J Foster,et al.  Changes in protein expression during honey bee larval development , 2008, Genome Biology.

[54]  Jef D Boeke,et al.  Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. , 2006, Genome research.

[55]  The UniProt Consortium,et al.  Update on activities at the Universal Protein Resource (UniProt) in 2013 , 2012, Nucleic Acids Res..

[56]  L. Foster,et al.  Proteome profile and lentiviral transduction of cultured honey bee (Apis mellifera L.) cells , 2010, Insect molecular biology.

[57]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[58]  Christine G. Elsik,et al.  Hymenoptera Genome Database: integrated community resources for insect species of the order Hymenoptera , 2010, Nucleic Acids Res..

[59]  Eugene W. Myers,et al.  PILER: identification and classification of genomic repeats , 2005, ISMB.

[60]  Henry S. Pollock,et al.  Ecologically Appropriate Xenobiotics Induce Cytochrome P450s in Apis mellifera , 2012, PloS one.

[61]  C. Dantec,et al.  Nutrigenomics in honey bees: digital gene expression analysis of pollen's nutritive effects on healthy and varroa-parasitized bees , 2011, BMC Genomics.

[62]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[63]  Dan Graur,et al.  Identifying compositionally homogeneous and nonhomogeneous domains within the human genome using a novel segmentation algorithm , 2010, Nucleic acids research.

[64]  Tao Jiang,et al.  Finding Genes by Computer: Probabilistic and Discriminative Approaches , 2002 .

[65]  Anushya Muruganujan,et al.  PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium , 2009, Nucleic Acids Res..

[66]  Sébastien Carrère,et al.  The ProDom database of protein domain families: more emphasis on 3D , 2004, Nucleic Acids Res..

[67]  D. Graur,et al.  IsoPlotter+: A Tool for Studying the Compositional Architecture of Genomes , 2013, ISRN bioinformatics.

[68]  Ian Sillitoe,et al.  Gene3D: a domain-based resource for comparative genomics, functional annotation and protein network analysis , 2011, Nucleic Acids Res..

[69]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[70]  Joshua M. Stuart,et al.  Genome 10K: a proposal to obtain whole-genome sequence for 10,000 vertebrate species. , 2009, The Journal of heredity.

[71]  F. Wright The 'effective number of codons' used in a gene. , 1990, Gene.

[72]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[73]  Narmada Thanki,et al.  CDD: a conserved domain database for interactive domain family analysis , 2006, Nucleic Acids Res..

[74]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[75]  L. Foster,et al.  A honey bee (Apis mellifera L.) PeptideAtlas crossing castes and tissues , 2011, BMC Genomics.

[76]  E. Myers,et al.  Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence , 2002, Genome Biology.

[77]  Evgeny M. Zdobnov,et al.  OrthoDB: a hierarchical catalog of animal, fungal and bacterial orthologs , 2012, Nucleic Acids Res..

[78]  Pavel A. Pevzner,et al.  De novo identification of repeat families in large genomes , 2005, ISMB.

[79]  J. Bennetzen,et al.  A unified classification system for eukaryotic transposable elements , 2007, Nature Reviews Genetics.

[80]  Bin Han,et al.  Differential expressions of nuclear proteomes between honeybee (Apis mellifera L.) Queen and Worker Larvae: a deep insight into caste pathway decisions. , 2012, Journal of proteome research.

[81]  G. K. Davis,et al.  Genome Sequence of the Pea Aphid Acyrthosiphon pisum , 2010, PLoS biology.

[82]  Andrew Smith Genome sequence of the nematode C-elegans: A platform for investigating biology , 1998 .

[83]  Florian Odronitz,et al.  Scipio: Using protein sequences to determine the precise exon/intron structures of genes and their orthologs in closely related species , 2008, BMC Bioinformatics.

[84]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[85]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[86]  The Honeybee Genome Sequencing Consortium,et al.  Erratum: Insights into social insects from the genome of the honeybee Apis mellifera , 2006, Nature.

[87]  Stefan Kurtz,et al.  LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons , 2008, BMC Bioinformatics.

[88]  L. Foster,et al.  The innate immune and systemic response in honey bees to a bacterial pathogen, Paenibacillus larvae , 2009, BMC Genomics.

[89]  A. Eyre-Walker,et al.  Hundreds of putatively functional small open reading frames in Drosophila , 2011, Genome Biology.

[90]  T. Bureau,et al.  Terminal-repeat retrotransposons in miniature (TRIM) are involved in restructuring plant genomes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[91]  N. Calderone,et al.  Insect Pollinated Crops, Insect Pollinators and US Agriculture: Trend Analysis of Aggregate Data for the Period 1992–2009 , 2012, PloS one.

[92]  B. Graveley The developmental transcriptome of Drosophila melanogaster , 2010, Nature.

[93]  V. Solovyev,et al.  Automatic annotation of eukaryotic genes, pseudogenes and promoters , 2006, Genome Biology.

[94]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[95]  Alejandro A. Schäffer,et al.  WindowMasker: window-based masker for sequenced genomes , 2006, Bioinform..

[96]  International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome , 2004 .

[97]  Cyrus Chothia,et al.  SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny , 2008, Nucleic Acids Res..

[98]  R. Guigó,et al.  Comparative gene prediction in human and mouse. , 2003, Genome research.

[99]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[100]  Michael R Brent,et al.  Using N‐SCAN or TWINSCAN to Predict Gene Structures in Genomic DNA Sequences , 2007, Current protocols in bioinformatics.

[101]  A PevznerPavel,et al.  De novo identification of repeat families in large genomes , 2005 .

[102]  T. Flutre,et al.  Considering Transposable Element Diversification in De Novo Annotation Approaches , 2011, PloS one.

[103]  Gregory Kucherov,et al.  mreps: efficient and flexible detection of tandem repeats in DNA , 2003, Nucleic Acids Res..

[104]  Ofer Peleg,et al.  Large Retrotransposon Derivatives: Abundant, Conserved but Nonautonomous Retroelements of Barley and Related Genomes , 2004, Genetics.

[105]  Burkhard Morgenstern,et al.  Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources , 2006, BMC Bioinformatics.

[106]  Brian R. Johnson,et al.  Draft genome of the red harvester ant Pogonomyrmex barbatus , 2011, Proceedings of the National Academy of Sciences.

[107]  H. Quesneville,et al.  Detection of New Transposable Element Families in Drosophila melanogaster and Anopheles gambiae Genomes , 2003, Journal of Molecular Evolution.

[108]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[109]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[110]  Brian R. Johnson,et al.  The Genome Sequence of the Leaf-Cutter Ant Atta cephalotes Reveals Insights into Its Obligate Symbiotic Lifestyle , 2011, PLoS genetics.

[111]  Gene E Robinson,et al.  Understanding the relationship between brain gene expression and social behavior: lessons from the honey bee. , 2012, Annual review of genetics.

[112]  G. Weinstock,et al.  Creating a honey bee consensus gene set , 2007, Genome Biology.

[113]  Peter A. Meric,et al.  Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse , 2009, PLoS biology.

[114]  J. Jurka,et al.  A universal classification of eukaryotic transposable elements implemented in Repbase , 2008, Nature Reviews Genetics.