GenomeRunner: automating genome exploration

MOTIVATION One of the challenges in interpreting high-throughput genomic studies such as a genome-wide associations, microarray or ChIP-seq is their open-ended nature-once a set of experimentally identified regions is identified as statistically significant, at least two questions arise: (i) besides P-value, do any of these significant regions stand out in terms of biological implications? (ii) Does the set of significant regions, as a whole, have anything in common genome wide? These issues are difficult to address because of the growing number of annotated genomic features (e.g. single nucleotide polymorphisms, transcription factor binding sites, methylation peaks, etc.), and it is difficult to know a priori which features would be most fruitful to analyze. Our goal is to provide partial automation of this process to begin examining associations between experimental features and annotated genomic regions in a hypothesis-free, data-driven manner. RESULTS We created GenomeRunner-a tool for automating annotation and enrichment of genomic features of interest (FOI) with annotated genomic features (GFs), in different organisms. Besides simple association of FOIs with known GFs GenomeRunner tests whether the enriched FOIs, as a group, are statistically associated with a large and growing set of genomic features. AVAILABILITY GenomeRunner setup files and source code are freely available at http://sourceforge.net/projects/genomerunner. CONTACT mikhail-dozmorov@omrf.org; Jonathan-Wren@omrf.org; jdwren@gmail.com SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Kai Wang,et al.  Pathway-based approaches for analysis of genomewide association studies. , 2007, American journal of human genetics.

[2]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[3]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[5]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[6]  Samuel S. Shepard,et al.  Critical association of ncRNA with introns , 2010, Nucleic acids research.

[7]  David Johnson,et al.  Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set , 2005, BMC Bioinformatics.

[8]  Cory Y. McLean,et al.  GREAT improves functional interpretation of cis-regulatory regions , 2010, Nature Biotechnology.

[9]  Melissa S. Cline,et al.  Using bioinformatics to predict the functional impact of SNVs , 2011, Bioinform..

[10]  Jonathan D. Wren,et al.  A global meta-analysis of microarray expression data to predict unknown gene functions and estimate the literature-data divide , 2009, Bioinform..

[11]  Lin S. Chen,et al.  Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. , 2010, American journal of human genetics.

[12]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[13]  David V Conti,et al.  Testing association between disease and multiple SNPs in a candidate gene , 2007, Genetic epidemiology.