WTFgenes: What's The Function of these genes? Static sites for model-based gene set analysis

A common technique for interpreting experimentally-identified lists of genes is to look for enrichment of genes associated to particular ontology terms. The most common technique uses the hypergeometric distribution; more recently, a model-based approach was proposed. These approaches must typically be run using downloaded software, or on a server. We develop a collapsed likelihood for model-based gene set analysis and present WTFgenes, an implementation of both hypergeometric and model-based approaches, that can be published as a static site with computation run in JavaScript on the user's web browser client. Apart from hosting files, zero server resources are required: the site can (for example) be served directly from Amazon S3 or GitHub Pages. A C++11 implementation yielding identical results runs roughly twice as fast as the JavaScript version. WTFgenes is available from https://github.com/evoldoers/wtfgenes under the BSD3 license. A demonstration for the Gene Ontology is usable at https://evoldoers.github.io/wtfgo. Contact: Ian Holmes ihholmes+wtfgenes@gmail.com.

[1]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[2]  Suzanna E Lewis,et al.  JBrowse: a dynamic web platform for genome visualization and analysis , 2016, Genome Biology.

[3]  Andrew D. Rouillard,et al.  GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions , 2015, Bioinform..

[4]  Boris P. Hejblum,et al.  Time-Course Gene Set Analysis for Longitudinal Gene Expression Data , 2015, PLoS Comput. Biol..

[5]  Hongyu Zhao,et al.  A MARKOV RANDOM FIELD-BASED APPROACH TO CHARACTERIZING HUMAN BRAIN DEVELOPMENT USING SPATIAL-TEMPORAL TRANSCRIPTOME DATA. , 2015, The annals of applied statistics.

[6]  Niko Beerenwinkel,et al.  Modeling Mutual Exclusivity of Cancer Mutations , 2014, RECOMB.

[7]  Anushya Muruganujan,et al.  Large-scale gene function analysis with the PANTHER classification system , 2013, Nature Protocols.

[8]  Avi Ma'ayan,et al.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool , 2013, BMC Bioinformatics.

[9]  Neil D. Lawrence,et al.  A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression , 2011, BMC Bioinformatics.

[10]  Peter N. Robinson,et al.  Model-based gene set analysis for Bioconductor , 2011, Bioinform..

[11]  Cory Y. McLean,et al.  GREAT improves functional interpretation of cis-regulatory regions , 2010, Nature Biotechnology.

[12]  Peter N. Robinson,et al.  GOing Bayesian: model-based gene set analysis of genome-scale data , 2010, Nucleic acids research.

[13]  I. Simon,et al.  A probabilistic generative model for GO enrichment analysis , 2008, Nucleic acids research.

[14]  Martin Vingron,et al.  Ontologizer 2.0 - a multifunctional tool for GO term enrichment analysis and data exploration , 2008, Bioinform..

[15]  David Botstein,et al.  GO: : TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes , 2004, Bioinform..

[16]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[17]  Mark D. Robinson,et al.  FunSpec: a web-based cluster interpreter for yeast , 2002, BMC Bioinformatics.

[18]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[19]  C. Baxter Dedication , 1993, Neurochemistry International.

[20]  J. Powers MATHEMATICS OF A LADY TASTING TEA REVISITED , 1988 .

[21]  P. Khatri,et al.  Profiling Gene Expression Using Onto-Express , 2002 .

[22]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .