ASGARD: an open-access database of annotated transcriptomes for emerging model arthropod species

The increased throughput and decreased cost of next-generation sequencing (NGS) have shifted the bottleneck genomic research from sequencing to annotation, analysis and accessibility. This is particularly challenging for research communities working on organisms that lack the basic infrastructure of a sequenced genome, or an efficient way to utilize whatever sequence data may be available. Here we present a new database, the Assembled Searchable Giant Arthropod Read Database (ASGARD). This database is a repository and search engine for transcriptomic data from arthropods that are of high interest to multiple research communities but currently lack sequenced genomes. We demonstrate the functionality and utility of ASGARD using de novo assembled transcriptomes from the milkweed bug Oncopeltus fasciatus, the cricket Gryllus bimaculatus and the amphipod crustacean Parhyale hawaiensis. We have annotated these transcriptomes to assign putative orthology, coding region determination, protein domain identification and Gene Ontology (GO) term annotation to all possible assembly products. ASGARD allows users to search all assemblies by orthology annotation, GO term annotation or Basic Local Alignment Search Tool. User-friendly features of ASGARD include search term auto-completion suggestions based on database content, the ability to download assembly product sequences in FASTA format, direct links to NCBI data for predicted orthologs and graphical representation of the location of protein domains and matches to similar sequences from the NCBI non-redundant database. ASGARD will be a useful repository for transcriptome data from future NGS studies on these and other emerging model arthropods, regardless of sequencing platform, assembly or annotation status. This database thus provides easy, one-stop access to multi-species annotated transcriptome information. We anticipate that this database will be useful for members of multiple research communities, including developmental biology, physiology, evolutionary biology, ecology, comparative genomics and phylogenomics. Database URL: asgard.rc.fas.harvard.edu

[1]  Lei Zhang,et al.  Profiling the metatranscriptome of the protistan community in Coptotermes formosanus with emphasis on the lignocellulolytic system. , 2012, Genomics.

[2]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[3]  J. Havemann,et al.  Germ cells in the crustacean Parhyale hawaiensis depend on Vasa protein for their maintenance but not for their formation. , 2009, Developmental biology.

[4]  G. Anfora,et al.  Putative Chemosensory Receptors of the Codling Moth, Cydia pomonella, Identified by Antennal Transcriptome Analysis , 2012, PloS one.

[5]  Erik L. L. Sonnhammer,et al.  A Hidden Markov Model for Predicting Transmembrane Helices in Protein Sequences , 1998, ISMB.

[6]  G. Varley Insect Physiology , 1965, Nature.

[7]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[8]  Owen White,et al.  The TIGRFAMs database of protein families , 2003, Nucleic Acids Res..

[9]  M. Averof,et al.  Establishing genetic transformation for comparative developmental studies in the crustacean Parhyale hawaiensis , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  T. Kaufman,et al.  Morphology and husbandry of the large milkweed bug, Oncopeltus fasciatus. , 2009, Cold Spring Harbor protocols.

[11]  Alexie Papanicolaou,et al.  Next generation transcriptomes for next generation genomes using est2assembly , 2009, BMC Bioinformatics.

[12]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[13]  M. Blythe,et al.  High Through-Put Sequencing of the Parhyale hawaiensis mRNAs and microRNAs to Aid Comparative Developmental Studies , 2012, PloS one.

[14]  T. Morgan,et al.  SEX LIMITED INHERITANCE IN DROSOPHILA. , 2022, Science.

[15]  Hiroaki Kitano,et al.  The PANTHER database of protein families, subfamilies, functions and pathways , 2004, Nucleic Acids Res..

[16]  Evgeny M. Zdobnov,et al.  VectorBase: a home for invertebrate vectors of human pathogens , 2006, Nucleic Acids Res..

[17]  N. Patel,et al.  A prominent requirement for single-minded and the ventral midline in patterning the dorsoventral axis of the crustacean Parhyale hawaiensis , 2010, Development.

[18]  R. Sunkar,et al.  Identification and developmental profiling of conserved and novel microRNAs in Manduca sexta. , 2012, Insect biochemistry and molecular biology.

[19]  J. Tytgat,et al.  Molecular diversity of the telson and venom components from Pandinus cavimanus (Scorpionidae Latreille 1802): Transcriptome, venomics and function , 2012, Proteomics.

[20]  E. Moss,et al.  RNA interference: It's a small RNA world , 2001, Current Biology.

[21]  Jian Wang,et al.  The Genome Sequence of the Malaria Mosquito Anopheles gambiae , 2002, Science.

[22]  C. Chothia,et al.  Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. , 2001, Journal of molecular biology.

[23]  P. Bouffard,et al.  Combining next-generation pyrosequencing with microarray for large scale expression analysis in non-model species , 2009, BMC Genomics.

[24]  Leszek Rychlewski,et al.  FFAS03: a server for profile–profile sequence alignments , 2005, Nucleic Acids Res..

[25]  Jun-Bo Luan,et al.  Transcriptome analysis and comparison reveal divergence between two invasive whitefly cryptic species , 2011, BMC Genomics.

[26]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[27]  Wei E Huang,et al.  When single cell technology meets omics, the new toolbox of analytical biotechnology is emerging. , 2012, Current opinion in biotechnology.

[28]  J. Shultz,et al.  Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences , 2010, Nature.

[29]  Simon H. Martin,et al.  Butterfly genome reveals promiscuous exchange of mimicry adaptations among species , 2012, Nature.

[30]  A. Popadic,et al.  Diverging functions of Scr between embryonic and post-embryonic development in a hemimetabolous insect, Oncopeltus fasciatus. , 2009, Developmental biology.

[31]  Fatih Ozsolak,et al.  RNA sequencing: advances, challenges and opportunities , 2011, Nature Reviews Genetics.

[32]  Inge Jonassen,et al.  Characteristics of 454 pyrosequencing data—enabling realistic simulation with flowsim , 2010, Bioinform..

[33]  G. Edgecombe,et al.  Arthropod Fossils and Phylogeny , 1999 .

[34]  Ben Ewen-Campen,et al.  De novo assembly and characterization of a maternal and developmental transcriptome for the emerging model crustacean Parhyale hawaiensis , 2011, BMC Genomics.

[35]  Shiguo Huang,et al.  Developmental and insecticide-resistant insights from the de novo assembled transcriptome of the diamondback moth, Plutella xylostella. , 2012, Genomics.

[36]  Inanç Birol,et al.  De novo transcriptome assembly with ABySS , 2009, Bioinform..

[37]  Thomas Hunt Morgan,et al.  Sex-linked inheritance in Drosophila , 1916 .

[38]  D. P. Wall,et al.  Detecting putative orthologs , 2003, Bioinform..

[39]  M. Averof,et al.  A versatile strategy for gene trapping and trap conversion in emerging model organisms , 2011, Development.

[40]  Arthropod Relationships , 1998, The Systematics Association Special Volume Series.

[41]  E. Wajnberg,et al.  Alien arthropod predators and parasitoids: an ecological approach , 2011, BioControl.

[42]  David R. Gilbert,et al.  FlyBase: a Drosophila database. The FlyBase consortium , 1997, Nucleic Acids Res..

[43]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[44]  T. Mito,et al.  Molecular and Cellular Basis of Regeneration and Tissue Repair , 2007, Cellular and Molecular Life Sciences.

[45]  M. Robles,et al.  University of Birmingham High throughput functional annotation and data mining with the Blast2GO suite , 2022 .

[46]  Shuai Zhan,et al.  The Monarch Butterfly Genome Yields Insights into Long-Distance Migration , 2011, Cell.

[47]  J. Marden,et al.  Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing , 2008, Molecular ecology.

[48]  Joel W. Hedgpeth,et al.  The Arthropoda: Habits, Functional Morphology and Evolution , 1979 .

[49]  Paul M. Choate,et al.  Evolution of the Insects , 2006 .

[50]  Robert S. Ledley,et al.  PIRSF: family classification system at the Protein Information Resource , 2004, Nucleic Acids Res..

[51]  Peer Bork,et al.  The Genome of the Model Beetle and Pest Tribolium Castaneum Vertebrate-specific Orthologues Insect-specific Orthologues Homology Undetectable Similarity , 2022 .

[52]  P. Jolivet Interrelationship between insects and plants , 1996 .

[53]  Claudio Mussolino,et al.  TALE nucleases: tailored genome engineering made easy. , 2012, Current opinion in biotechnology.

[54]  H. Okamoto,et al.  Imaging of Transgenic Cricket Embryos Reveals Cell Movements Consistent with a Syncytial Patterning Mechanism , 2010, Current Biology.

[55]  Peer Bork,et al.  Recent improvements to the SMART domain-based sequence annotation resource , 2002, Nucleic Acids Res..

[56]  S. Brunak,et al.  Improved prediction of signal peptides: SignalP 3.0. , 2004, Journal of molecular biology.

[57]  R. Dudley The Biomechanics of Insect Flight: Form, Function, Evolution , 1999 .

[58]  M. Mizunami,et al.  Systemic RNA interference for the study of learning and memory in an insect , 2009, Journal of Neuroscience Methods.

[59]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[60]  Takahito Watanabe,et al.  Non-transgenic genome modifications in a hemimetabolous insect using zinc-finger and TAL effector nucleases , 2012, Nature Communications.

[61]  J. Hedgpeth,et al.  Arthropod Phylogeny with Special Reference to Insects , 1979 .

[62]  Daniel St Johnston,et al.  The art and design of genetic screens: Drosophila melanogaster , 2002, Nature Reviews Genetics.

[63]  Z. J. Wang Nature’s Flyers: Birds, Insects, and the Biomechanics of Flight , 2007 .

[64]  Dawei Li,et al.  A Draft Sequence for the Genome of the Domesticated Silkworm ( Bombyx mori ) , 2004 .

[65]  M. Blaxter,et al.  Comparing de novo assemblers for 454 transcriptome data , 2010, BMC Genomics.

[66]  Chris Smith,et al.  Large-Scale Trends in the Evolution of Gene Structures within 11 Animal Genomes , 2006, PLoS Comput. Biol..

[67]  Le Kang,et al.  De Novo Analysis of Transcriptome Dynamics in the Migratory Locust during the Development of Phase Traits , 2010, PloS one.

[68]  N. Platnick,et al.  The Arthropoda: Habits, Functional Morphology, and Evolution , 1978 .

[69]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[70]  W. Nierman,et al.  Carbohydrate‐active enzymes revealed in Coptotermes formosanus (Isoptera: Rhinotermitidae) transcriptome , 2012, Insect molecular biology.

[71]  Jade Buchanan-Carter,et al.  Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx , 2009, BMC Genomics.

[72]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[73]  Terri K. Attwood,et al.  The PRINTS database: a fine-grained protein sequence annotation and analysis resource—its status in 2012 , 2012, Database J. Biol. Databases Curation.

[74]  Natalie de Souza Zinc-finger nucleases , 2011, Nature Methods.

[75]  Z. J. W. Reviewer Nature’s Flyers: Birds, Insects, and the Biomechanics of Flight , 2003 .

[76]  C. V. Jongeneel,et al.  ESTScan: A Program for Detecting, Evaluating, and Reconstructing Potential Coding Regions in EST Sequences , 1999, ISMB.

[77]  J. Couso,et al.  RNAi analysis of nubbin embryonic functions in a hemimetabolous insect, Oncopeltus fasciatus , 2008, Evolution & development.

[78]  Gautier Koscielny,et al.  VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics , 2011, Nucleic Acids Res..

[79]  G. K. Davis,et al.  Genome Sequence of the Pea Aphid Acyrthosiphon pisum , 2010, PLoS biology.

[80]  D. Carroll Genome Engineering With Zinc-Finger Nucleases , 2011, Genetics.

[81]  Amos Bairoch,et al.  Recent improvements to the PROSITE database , 2004, Nucleic Acids Res..

[82]  References , 1971 .

[83]  R. Greenspan Fly pushing : the theory and practice of Drosophila genetics , 1996 .

[84]  Fabrice Legeai,et al.  AphidBase: a database for aphid genomic resources , 2007, Bioinform..

[85]  Juan Miguel García-Gómez,et al.  BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Manipulation of FASTQ data with Galaxy , 2005 .

[86]  P. Hadley SEX-LIMITED INHERITANCE. , 1910, Science.

[87]  Sébastien Carrère,et al.  The ProDom database of protein domain families: more emphasis on 3D , 2004, Nucleic Acids Res..

[88]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[89]  James E. Bray,et al.  Assigning genomic sequences to CATH , 2000, Nucleic Acids Res..

[90]  Michael Ashburner,et al.  Drosophila: A laboratory handbook , 1990 .

[91]  N. Patel,et al.  Probing the evolution of appendage specialization by Hox gene misexpression in an emerging model crustacean , 2009, Proceedings of the National Academy of Sciences.

[92]  Yonghua Li,et al.  BeetleBase: the model organism database for Tribolium castaneum , 2006, Nucleic Acids Res..

[93]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[94]  Meredith E. Protas,et al.  Knockdown of Parhyale Ultrabithorax recapitulates evolutionary changes in crustacean appendage morphology , 2009, Proceedings of the National Academy of Sciences.

[95]  K. Miyawaki,et al.  Involvement of Wingless/Armadillo signaling in the posterior sequential segmentation in the cricket, Gryllus bimaculatus (Orthoptera), as revealed by RNAi analysis , 2004, Mechanisms of Development.

[96]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[97]  S. Roth,et al.  The maternal and early embryonic transcriptome of the milkweed bug Oncopeltus fasciatus , 2011, BMC Genomics.

[98]  R. Denell,et al.  Comparative insect developmental genetics: phenotypes without mutants , 2001, BioEssays : news and reviews in molecular, cellular and developmental biology.

[99]  J. Montoya-Burgos,et al.  Optimization of de novo transcriptome assembly from next-generation sequencing data. , 2010, Genome research.

[100]  Jim Thurmond,et al.  FlyBase 101 – the basics of navigating FlyBase , 2011, Nucleic Acids Res..

[101]  T. Mito,et al.  The Two-Spotted Cricket Gryllus bimaculatus: An Emerging Model for Developmental and Regeneration Studies. , 2008, CSH protocols.

[102]  T. Shimada,et al.  The construction of an EST database for Bombyx mori and its application , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[103]  C. Desplan,et al.  Power tools for gene expression and clonal analysis in Drosophila , 2011, Nature Methods.