EGAD: Ultra-fast functional analysis of gene networks

Summary Evaluating gene networks with respect to known biology is a common task but often a computationally costly one. Many computational experiments are difficult to apply exhaustively in network analysis due to run-times. To permit high-throughput analysis of gene networks, we have implemented a set of very efficient tools to calculate functional properties in networks based on guilt-by-association methods. ( xtending ' uilt-by- ssociation' by egree) allows gene networks to be evaluated with respect to hundreds or thousands of gene sets. The methods predict novel members of gene groups, assess how well a gene network groups known sets of genes, and determines the degree to which generic predictions drive performance. By allowing fast evaluations, whether of random sets or real functional ones, provides the user with an assessment of performance which can easily be used in controlled evaluations across many parameters. Availability and Implementation The software package is freely available at https://github.com/sarbal/EGAD and implemented for use in R and Matlab. The package is also freely available under the LGPL license from the Bioconductor web site ( http://bioconductor.org ). Contact JGillis@cshl.edu. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  S. Oliver Proteomics: Guilt-by-association goes global , 2000, Nature.

[2]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[3]  Hyojin Kim,et al.  YeastNet v3: a public database of data-specific and integrated functional gene networks for Saccharomyces cerevisiae , 2013, Nucleic Acids Res..

[4]  A. Drake Of mice and men: what rodent models don't tell us , 2013, Cellular and Molecular Immunology.

[5]  Michael I. Jordan,et al.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence , 2008, Genome Biology.

[6]  Sara Ballouz,et al.  Measuring the wisdom of the crowds in network-based gene function inference , 2015, Bioinform..

[7]  R. Sharan,et al.  Protein networks in disease. , 2008, Genome research.

[8]  Yoav Gilad,et al.  A reanalysis of mouse ENCODE comparative gene expression data , 2015, F1000Research.

[9]  W. Kim,et al.  Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy , 2008, Genome Biology.

[10]  Paul Pavlidis,et al.  Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA) , 2013, BMC Bioinformatics.

[11]  Philippe Salembier,et al.  NetBenchmark: a bioconductor package for reproducible benchmarks of gene regulatory network inference , 2015, BMC Bioinformatics.

[12]  Kathleen Marchal,et al.  Evaluation of time profile reconstruction from complex two-color microarray designs , 2008, BMC Bioinformatics.

[13]  Daniel W. A. Buchan,et al.  A large-scale evaluation of computational protein function prediction , 2013, Nature Methods.

[14]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[15]  P. Pavlidis,et al.  Neurocarta: aggregating and sharing disease-gene relations for the neurosciences , 2013, BMC Genomics.

[16]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[17]  Alessandro Vullo,et al.  Ensembl 2015 , 2014, Nucleic Acids Res..

[18]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[19]  Sara Ballouz,et al.  Positive and negative forms of replicability in gene network analysis , 2016, Bioinform..

[20]  Keizo Takao,et al.  Genomic responses in mouse models greatly mimic human inflammatory diseases , 2014, Proceedings of the National Academy of Sciences.

[21]  Jesse Gillis,et al.  Progress and challenges in the computational prediction of gene function using networks: 2012-2013 update , 2013, F1000Research.

[22]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[23]  Sandra Tenreiro,et al.  Simple is good: yeast models of neurodegeneration. , 2010, FEMS yeast research.

[24]  Jesse Gillis,et al.  The Impact of Multifunctional Genes on "Guilt by Association" Analysis , 2011, PloS one.

[25]  Sara Ballouz,et al.  Bias tradeoffs in the creation and analysis of protein-protein interaction networks. , 2014, Journal of proteomics.

[26]  S. Schuierer,et al.  A comprehensive assessment of RNA-seq protocols for degraded and low-quantity samples , 2017, BMC Genomics.

[27]  Ana L. Teixeira,et al.  Prediction of human population responses to toxic compounds by a collaborative competition , 2015, Nature Biotechnology.

[28]  Jesse Gillis,et al.  Progress and challenges in the computational prediction of gene function using networks [ version 1 ; referees : 2 approved ] , 2016 .

[29]  Michael O. Hengartner,et al.  Finding function in novel targets: C. elegans as a model organism , 2006, Nature Reviews Drug Discovery.

[30]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[31]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[32]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[33]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[34]  Xiang Wan,et al.  Gemma: a resource for the reuse, sharing and meta-analysis of expression profiling data , 2012, Bioinform..

[35]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[36]  Paul Pavlidis,et al.  The role of indirect connections in gene networks in predicting function , 2011, Bioinform..

[37]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[38]  Steve D. M. Brown,et al.  The mouse ascending: perspectives for human-disease models , 2007, Nature Cell Biology.

[39]  Monte Westerfield,et al.  The Zebrafish Information Network (ZFIN): the zebrafish model organism database , 2003, Nucleic Acids Res..

[40]  R. Gamelli,et al.  Genomic responses in mouse models poorly mimic human inflammatory diseases , 2013, Proceedings of the National Academy of Sciences.

[41]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[42]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[43]  Paul Pavlidis,et al.  “Guilt by Association” Is the Exception Rather Than the Rule in Gene Networks , 2012, PLoS Comput. Biol..