Apollo: Democratizing genome annotation

Genome annotation is the process of identifying the location and function of a genome's encoded features. Improving the biological accuracy of annotation is a complex and iterative process requiring researchers to review and incorporate multiple sources of information such as transcriptome alignments, predictive models based on sequence profiles, and comparisons to features found in related organisms. Because rapidly decreasing costs are enabling an ever-growing number of scientists to incorporate sequencing as a routine laboratory technique, there is widespread demand for tools that can assist in the deliberative analytical review of genomic information. To this end, we present Apollo, an open source software package that enables researchers to efficiently inspect and refine the precise structure and role of genomic features in a graphical browser-based platform. Some of Apollo’s newer user interface features include support for real-time collaboration, allowing distributed users to simultaneously edit the same encoded features while also instantly seeing the updates made by other researchers on the same region in a manner similar to Google Docs. Its technical architecture enables Apollo to be integrated into multiple existing genomic analysis pipelines and heterogeneous laboratory workflow platforms. Finally, we consider the implications that Apollo and related applications may have on how the results of genome research are published and made accessible.

[1]  Thomas Stoll,et al.  Comprehensive Annotation of the Parastagonospora nodorum Reference Genome Using Next-Generation Genomics, Transcriptomics and Proteogenomics , 2016, PloS one.

[2]  Vincent Lombard,et al.  A model species for agricultural pest genomics: the genome of the Colorado potato beetle, Leptinotarsa decemlineata (Coleoptera: Chrysomelidae) , 2017, Scientific Reports.

[3]  Sandra Gesing,et al.  VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases , 2014, Nucleic Acids Res..

[4]  Gary Moore,et al.  The i5k Workspace@NAL—enabling genomic data access, visualization and curation of arthropod genomes , 2014, Nucleic Acids Res..

[5]  Judith A. Blake,et al.  Mouse Genome Database (MGD)-2018: knowledgebase for the laboratory mouse , 2017, Nucleic Acids Res..

[6]  Yang Wang,et al.  Multifaceted biological insights from a draft genome sequence of the tobacco hornworm moth, Manduca sexta. , 2016, Insect biochemistry and molecular biology.

[7]  Wei Liu,et al.  Genome Sequence of the Edible Cultivated Mushroom Lentinula edodes (Shiitake) Reveals Insights into Lignocellulose Degradation , 2016, PloS one.

[8]  Anthony Bretaudeau,et al.  Two genomes of highly polyphagous lepidopteran pests (Spodoptera frugiperda, Noctuidae) with different host-plant ranges , 2017, Scientific Reports.

[9]  Michael S Brainard,et al.  Draft genome assembly of the Bengalese finch, Lonchura striata domestica, a model for motor skill variability and learning , 2018, GigaScience.

[10]  Klaus Reinhardt,et al.  Unique features of a global human ectoparasite identified through sequencing of the bed bug genome , 2016, Nature Communications.

[11]  Chris Mungall,et al.  A Chado case study: an ontology-based modular schema for representing genome-associated biological information , 2007, ISMB/ECCB.

[12]  Zhiping Weng,et al.  The genome of the Hi5 germ cell line from Trichoplusia ni, an agricultural pest and novel model for small RNA biology , 2018, eLife.

[13]  Charles David,et al.  A manually annotated Actinidia chinensis var. chinensis (kiwifruit) genome highlights the challenges associated with draft genomes and gene prediction in plants , 2018, BMC Genomics.

[14]  Elizabeth Gibney,et al.  Scientists losing data at a rapid rate , 2013, Nature.

[15]  Mark Yandell,et al.  The sea lamprey germline genome provides insights into programmed genome rearrangement and vertebrate evolution , 2018, Nature Genetics.

[16]  Tudor Groza,et al.  The Human Phenotype Ontology in 2017 , 2016, Nucleic Acids Res..

[17]  Elizabeth Pennisi,et al.  Researchers reboot ambitious effort to sequence all vertebrate genomes, but challenges loom , 2018, Science.

[18]  Ping Zheng,et al.  GenSAS — An online integrated genome sequence annotation pipeline , 2011, 2011 4th International Conference on Biomedical Engineering and Informatics (BMEI).

[19]  Anthony Bretaudeau,et al.  Deployment of genome databases for insects using Galaxy Genome Annotation , 2017 .

[20]  Evgeny M. Zdobnov,et al.  Genome of the Asian longhorned beetle (Anoplophora glabripennis), a globally significant invasive species, reveals key functional and evolutionary innovations at the beetle–plant interface , 2016, Genome Biology.

[21]  Jessica C. Kissinger,et al.  Cryptosporidium hominis gene catalog: a resource for the selection of novel Cryptosporidium vaccine candidates , 2016, Database J. Biol. Databases Curation.

[22]  Kimberly Van Auken,et al.  WormBase 2017: molting into a new stage , 2017, Nucleic Acids Res..

[23]  Le-Shin Wu,et al.  Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies , 2014, Genome Biology.

[24]  Georgios K. Georgakilas,et al.  The whole genome sequence of the Mediterranean fruit fly, Ceratitis capitata (Wiedemann), reveals insights into the biology and adaptive evolution of a highly invasive pest species , 2016, Genome Biology.

[25]  Takaki Maekawa,et al.  Signatures of host specialization and a recent transposable element burst in the dynamic one-speed genome of the fungal barley powdery mildew pathogen , 2018, BMC Genomics.

[26]  Jason Williams,et al.  DNA Subway: Making Genome Analysis Egalitarian , 2014, XSEDE '14.

[27]  Hao Zhang,et al.  Genome sequence of the Asian Tiger mosquito, Aedes albopictus, reveals insights into its biology, genetics, and evolution , 2015, Proceedings of the National Academy of Sciences.

[28]  Parul Kudtarkar,et al.  Echinobase: an expanding resource for echinoderm genomic information , 2017, Database J. Biol. Databases Curation.

[29]  Christine G. Elsik,et al.  Hymenoptera Genome Database: integrating genome annotations in HymenopteraMine , 2015, Nucleic Acids Res..

[30]  Stephen P. Ficklin,et al.  GenSAS v5.1: A Web-Based Platform for Structural and Functional Annotation and Curation of Genomes , 2018 .

[31]  Zhongchi Liu,et al.  Genome re-annotation of the wild strawberry Fragaria vesca using extensive Illumina- and SMRT-based RNA-seq datasets , 2017, DNA research : an international journal for rapid publication of reports on genes and genomes.

[32]  Suzanna E Lewis,et al.  JBrowse: a dynamic web platform for genome visualization and analysis , 2016, Genome Biology.

[33]  Tudor Groza,et al.  Navigating the Phenotype Frontier: The Monarch Initiative , 2016, Genetics.

[34]  David A. Jones,et al.  The genome sequence and effector complement of the flax rust pathogen Melampsora lini , 2014, Front. Plant Sci..

[35]  Uma Maheswari,et al.  PhytoPath: an integrative resource for plant pathogen genomics , 2015, Nucleic Acids Res..

[36]  Kevin A. Burns,et al.  Xenbase: Core features, data acquisition, and data processing , 2015, Genesis.

[37]  Jingyuan Song,et al.  Global Identification of the Full-Length Transcripts and Alternative Splicing Related to Phenolic Acid Biosynthetic Genes in Salvia miltiorrhiza , 2016, Front. Plant Sci..

[38]  C. Linnen,et al.  Genetic Basis of Body Color and Spotting Pattern in Redheaded Pine Sawfly Larvae (Neodiprion lecontei) , 2018, Genetics.

[39]  Marcus C. Chibucos,et al.  Annotated draft genome sequences of three species of Cryptosporidium: Cryptosporidium meleagridis isolate UKMEL1, C. baileyi isolate TAMU-09Q1 and C. hominis isolates TU502_2012 and UKH1 , 2016, Pathogens and disease.

[40]  Jernej Jakše,et al.  Complete mitochondrial genome of the Verticillium-wilt causing plant pathogen Verticillium nonalfalfae , 2016, PloS one.

[41]  Glen Smith,et al.  Grails in Action , 2009 .

[42]  Markus S. Schröder,et al.  Proteogenomics produces comprehensive and highly accurate protein-coding gene annotation in a complete genome assembly of Malassezia sympodialis , 2017, Nucleic acids research.

[43]  Anthony Bretaudeau,et al.  Galaxy Genome Annotation project: Integrating Galaxy and GMOD for genome annotation , 2018 .

[44]  Anthony Bretaudeau,et al.  GGA: Galaxy for genome annotation, teaching, and genomic databases , 2018 .

[45]  E. Danchin,et al.  The genome of the yellow potato cyst nematode, Globodera rostochiensis, reveals insights into the basis of parasitism and virulence , 2016, Genome Biology.

[46]  E. Birney,et al.  Apollo: a sequence annotation editor , 2002, Genome Biology.

[47]  Anthony Westbrook,et al.  De novo Genome Assembly of Geosmithia morbida, the Causal Agent of Thousand Cankers Disease , 2016 .

[48]  E. Chuang,et al.  Whole-genome de novo sequencing reveals unique genes that contributed to the adaptive evolution of the Mikado pheasant , 2018, GigaScience.

[49]  Monica C Munoz-Torres,et al.  Web Apollo: a web-based genomic annotation editing platform , 2013, Genome Biology.

[50]  Yu-Yu Lin,et al.  The Toxicogenome of Hyalella azteca: A Model for Sediment Ecotoxicology and Evolutionary Toxicology. , 2018, Environmental science & technology.