The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals

Bgee is a database to retrieve and compare gene expression patterns in multiple animal species, produced by integrating multiple data types (RNA-Seq, Affymetrix, in situ hybridization, and EST data). It is based exclusively on curated healthy wild-type expression data (e.g., no gene knock-out, no treatment, no disease), to provide a comparable reference of normal gene expression. Curation includes very large datasets such as GTEx (re-annotation of samples as “healthy” or not) as well as many small ones. Data are integrated and made comparable between species thanks to consistent data annotation and processing, and to calls of presence/absence of expression, along with expression scores. As a result, Bgee is capable of detecting the conditions of expression of any single gene, accommodating any data type and species. Bgee provides several tools for analyses, allowing, e.g., automated comparisons of gene expression patterns within and between species, retrieval of the prefered conditions of expression of any gene, or enrichment analyses of conditions with expression of sets of genes. Bgee release 14.1 includes 29 animal species, and is available at https://bgee.org/ and through its Bioconductor R package BgeeDB.

[1]  Tarcisio Mendes de Farias,et al.  Enabling semantic queries across federated bioinformatics databases , 2019, Database : the journal of biological databases and curation.

[2]  M. Coates,et al.  A new technique for identifying sequence heterochrony. , 2005, Systematic biology.

[3]  K. M. Sefc,et al.  Gene expression profiling suggests differences in molecular mechanisms of fin elongation between cichlid species , 2019, Scientific Reports.

[4]  Shane J. Neph,et al.  A comparative encyclopedia of DNA elements in the mouse genome , 2014, Nature.

[5]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[6]  G. Rubin,et al.  Global analysis of patterns of gene expression during Drosophila embryogenesis , 2007, Genome Biology.

[7]  Jan Gorodkin,et al.  TISSUES 2.0: an integrative web resource on mammalian tissue expression , 2018, Database J. Biol. Databases Curation.

[8]  Biocuration: Distilling data into knowledge , 2018, PLoS biology.

[9]  Rochelle Buffenstein,et al.  Gene expression defines natural changes in mammalian lifespan , 2015, Aging cell.

[10]  Erik Segerdell,et al.  An ontology for Xenopus anatomy and development , 2008, BMC Developmental Biology.

[11]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[12]  Judith A. Blake,et al.  Unification of multi-species vertebrate anatomy ontologies for comparative biology in Uberon , 2014, Journal of Biomedical Semantics.

[13]  Marcus C. Chibucos,et al.  The Confidence Information Ontology: a step towards a standard for asserting confidence in annotations , 2015, Database J. Biol. Databases Curation.

[14]  Masato Kimura,et al.  NCBI’s Database of Genotypes and Phenotypes: dbGaP , 2013, Nucleic Acids Res..

[15]  Ana Kozomara,et al.  miRBase: from microRNA sequences to function , 2018, Nucleic Acids Res..

[16]  Nuno A. Fonseca,et al.  Expression Atlas: gene and protein expression across multiple studies and organisms , 2017, Nucleic Acids Res..

[17]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[18]  Astrid Gall,et al.  Ensembl 2020 , 2019, Nucleic Acids Res..

[19]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[20]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[21]  Ellen T. Gelfand,et al.  A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project , 2015, Biopreservation and biobanking.

[22]  Steffen Staab,et al.  Programming the Semantic Web , 2014, ESWC.

[23]  Owen L. Astrachan,et al.  Bubble sort: an archaeological algorithmic analysis , 2003, SIGCSE.

[24]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[25]  Peter H. Sudmant,et al.  Meta-analysis of RNA-seq expression data across species, tissues and studies , 2015, Genome Biology.

[26]  A. Brazma,et al.  Reuse of public genome-wide gene expression data , 2012, Nature Reviews Genetics.

[27]  J. Thornton,et al.  Correcting for sequence biases in present/absent calls , 2007, Genome Biology.

[28]  Tsippi Iny Stein,et al.  The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses , 2016, Current protocols in bioinformatics.

[29]  Judith A. Blake,et al.  Model organism data evolving in support of translational medicine , 2018, Lab Animal.

[30]  Alex Warwick Vesztrocy,et al.  OMA standalone: orthology inference among public and custom genomes and transcriptomes , 2018, bioRxiv.

[31]  Andrea Komljenovic,et al.  BgeeDB, an R package for retrieval of curated expression datasets and for gene list expression localization enrichment tests , 2016, F1000Research.

[32]  K. Jabbari,et al.  A genomic view on epilepsy and autism candidate genes. , 2016, Genomics.

[33]  P. Holland,et al.  An amphioxus homeobox gene: sequence conservation, spatial expression during development and insights into vertebrate evolution. , 1992, Development.

[34]  Julien Roux,et al.  An ontology to clarify homology-related concepts. , 2010, Trends in genetics : TIG.

[35]  James E. Allen,et al.  Ensembl Genomes 2020—enabling non-vertebrate genomic research , 2019, Nucleic Acids Res..

[36]  Gaston H. Gonnet,et al.  The OMA orthology database in 2018: retrieving evolutionary relationships among all domains of life through richer web and programmatic interfaces , 2017, Nucleic Acids Res..

[37]  Anne Niknejad,et al.  Uncovering hidden duplicated content in public transcriptomics data , 2013, Database J. Biol. Databases Curation.

[38]  Erik Segerdell,et al.  Enhanced XAO: the ontology of Xenopus anatomy and development underpins more accurate annotation of gene expression and queries on Xenbase , 2013, Journal of Biomedical Semantics.

[39]  Christophe Dessimoz,et al.  The Gene Ontology Handbook , 2017, Methods in Molecular Biology.

[40]  Giulia Antonazzo,et al.  FlyBase 2.0: the next generation , 2018, Nucleic Acids Res..

[41]  Monte Westerfield,et al.  ZFIN, the Zebrafish Model Organism Database: increased support for mutants and transgenics , 2012, Nucleic Acids Res..

[42]  Jennifer F. Hughes,et al.  Conservation, acquisition, and functional impact of sex-biased gene expression in mammals , 2019, Science.

[43]  Joel E. Richardson,et al.  The mouse Gene Expression Database (GXD): 2019 update , 2018, Nucleic Acids Res..

[44]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Stuart I. Feldman,et al.  Make — a program for maintaining computer programs , 1979, Softw. Pract. Exp..

[46]  J. Nielsen,et al.  Analysis of the Human Tissue-specific Expression by Genome-wide Integration of Transcriptomics and Antibody-based Proteomics. , 2014, Molecular & cellular proteomics : MCP.

[47]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[48]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[49]  Andrea Komljenovic,et al.  BgeeDB, an R package for retrieval of curated expression datasets and for gene list expression localization enrichment tests , 2016, F1000Research.

[50]  A. Yoshiki,et al.  Mouse phenome research: implications of genetic background. , 2006, ILAR journal.

[51]  Colin Evans,et al.  Programming the Semantic Web , 2009 .

[52]  Manolis Kellis,et al.  Spatial expression of transcription factors in Drosophila embryonic organ development , 2013, Genome Biology.

[53]  Craig Nelson,et al.  Hox genes and the evolution of vertebrate axial morphology. , 1995, Development.

[54]  John C Marioni,et al.  Challenges in measuring and understanding biological noise , 2019, Nature Reviews Genetics.

[55]  The Gene Ontology Consortium,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2018, Nucleic Acids Res..

[56]  Z. Gong,et al.  Comparative Transcriptome Analyses Indicate Molecular Homology of Zebrafish Swimbladder and Mammalian Lung , 2011, PloS one.

[57]  Massimiliano Izzo,et al.  FAIRsharing as a community approach to standards, repositories and policies , 2019, Nature Biotechnology.

[58]  Wei-Min Liu,et al.  Analysis of high density expression microarrays with signed-rank call algorithms , 2002, Bioinform..

[59]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[60]  Terry F. Hayamizu,et al.  Mouse anatomy ontologies: enhancements and tools for exploring and integrating biomedical data , 2015, Mammalian Genome.

[61]  S. Carroll Endless forms most beautiful : the new science of evo devo and the making of the animal kingdom , 2005 .

[62]  Emily Dimmer,et al.  Formalization of taxon-based constraints to detect inconsistencies in annotation and ontology development , 2010, BMC Bioinformatics.

[63]  S. Graham,et al.  DBA/2J mouse model for experimental glaucoma: pitfalls and problems , 2016, Clinical & experimental ophthalmology.

[64]  A. Schmidt-Rhaesa The Evolution of Organ Systems , 2007 .

[65]  A. Ramasamy,et al.  Widespread sex differences in gene expression and splicing in the adult human brain , 2013, Nature Communications.

[66]  Allon M. Klein,et al.  The dynamics of gene expression in vertebrate embryogenesis at single-cell resolution , 2018, Science.

[67]  Scott Federhen,et al.  The NCBI Taxonomy database , 2011, Nucleic Acids Res..

[68]  Keun-Ah Cheon,et al.  Characteristics of Brains in Autism Spectrum Disorder: Structure, Function and Connectivity across the Lifespan , 2015, Experimental neurobiology.

[69]  Kimberly Van Auken,et al.  WormBase: a modern Model Organism Information Resource , 2019, Nucleic Acids Res..

[70]  Oliver Horlacher,et al.  The SIB Swiss Institute of Bioinformatics’ resources: focus on curated databases , 2015, Nucleic Acids Res..

[71]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[72]  E. Zeggini,et al.  Functional annotation of non-coding sequence variants , 2014, Nature Methods.

[73]  Rasko Leinonen,et al.  The sequence read archive: explosive growth of sequencing data , 2011, Nucleic Acids Res..

[74]  Winston A Hide,et al.  Big data: The future of biocuration , 2008, Nature.

[75]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[76]  J. Nielsen,et al.  Analysis of the Human Tissue-specific Expression by Genome-wide Integration of Transcriptomics and Antibody-based Proteomics* , 2013, Molecular & Cellular Proteomics.

[77]  J. Shendure,et al.  A general framework for estimating the relative pathogenicity of human genetic variants , 2014, Nature Genetics.

[78]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[79]  Tatiana A. Tatusova,et al.  Gene: a gene-centered information resource at NCBI , 2014, Nucleic Acids Res..

[80]  Wacław Kunierczyk,et al.  Taxonomy-based partitioning of the Gene Ontology , 2008 .

[81]  J. Postlethwait,et al.  A new model army: Emerging fish models to study the genomics of vertebrate Evo-Devo. , 2015, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[82]  Guy L. Steele,et al.  The Java Language Specification, Java SE 8 Edition , 2013 .

[83]  Wei-Min Liu,et al.  Robust estimators for expression analysis , 2002, Bioinform..

[84]  Robert D. Finn,et al.  Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species , 2017, Nucleic Acids Res..

[85]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[86]  Ying Wang,et al.  Xenbase: a genomic, epigenomic and transcriptomic model organism database , 2017, Nucleic Acids Res..

[87]  Huiqing Zhan,et al.  Conservation of gene expression signatures between zebrafish and human liver tumors and tumor progression , 2006, Nature Biotechnology.

[88]  David Osumi-Sutherland,et al.  The Drosophila anatomy ontology , 2013, J. Biomed. Semant..

[89]  Monte Westerfield,et al.  The zebrafish anatomy and stage ontologies: representing the anatomy and development of Danio rerio , 2014, Journal of Biomedical Semantics.

[90]  Nuno A. Fonseca,et al.  ArrayExpress update – from bulk to single-cell expression data , 2018, Nucleic Acids Res..

[91]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[92]  E. Ave Subscriptions , 2012, Nucleic Acids Research.

[93]  Marc Robinson-Rechavi,et al.  IQRray, a new method for Affymetrix microarray quality control, and the homologous organ conservation score, a new benchmark method for quality control metrics , 2014, Bioinform..

[94]  Alexander D. Diehl,et al.  Logical Development of the Cell Ontology , 2011, BMC Bioinformatics.

[95]  David Haussler,et al.  The UCSC Genome Browser database: 2019 update , 2018, Nucleic Acids Res..

[96]  J. Claverie,et al.  The significance of digital gene expression profiles. , 1997, Genome research.

[97]  Waclaw Kusnierczyk,et al.  Taxonomy-based partitioning of the Gene Ontology , 2008, J. Biomed. Informatics.

[98]  C. Sander,et al.  A Mammalian microRNA Expression Atlas Based on Small RNA Library Sequencing , 2007, Cell.

[99]  Maria Anisimova,et al.  Enabling semantic queries across federated bioinformatics databases , 2019, bioRxiv.

[100]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[101]  A. Graham,et al.  The role of the endoderm in the development and evolution of the pharyngeal arches , 2005, Journal of anatomy.

[102]  L. Wagner,et al.  21. UniGene: A Unified View of the Transcriptome , 2003 .

[103]  James Malone,et al.  Ten quick tips for biocuration , 2019, PLoS Comput. Biol..

[104]  Sergio Contrino,et al.  InterMine: extensive web services for modern biology , 2014, Nucleic Acids Res..

[105]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[106]  K. Dąbrowski,et al.  Morphology and innervation of the teleost physostome swim bladders and their functional evolution in non-teleostean lineages. , 2012, Acta histochemica.

[107]  Z. Gu,et al.  Convergent and divergent genetic changes in the genome of Chinese and European pigs , 2017, Scientific Reports.

[108]  M. Robinson‐Rechavi,et al.  What to compare and how: Comparative transcriptomics for Evo‐Devo , 2015, Journal of experimental zoology. Part B, Molecular and developmental evolution.

[109]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[110]  M. Ashburner,et al.  Systematic determination of patterns of gene expression during Drosophila embryogenesis , 2002, Genome Biology.

[111]  The Expression Comparison Tool in Bgee , 2020 .

[112]  K. Dolinski,et al.  Use and misuse of the gene ontology annotations , 2008, Nature Reviews Genetics.