Public data and open source tools for multi-assay genomic investigation of disease

Molecular interrogation of a biological sample through DNA sequencing, RNA and microRNA profiling, proteomics and other assays, has the potential to provide a systems level approach to predicting treatment response and disease progression, and to developing precision therapies. Large publicly funded projects have generated extensive and freely available multi-assay data resources; however, bioinformatic and statistical methods for the analysis of such experiments are still nascent. We review multi-assay genomic data resources in the areas of clinical oncology, pharmacogenomics and other perturbation experiments, population genomics and regulatory genomics and other areas, and tools for data acquisition. Finally, we review bioinformatic tools that are explicitly geared toward integrative genomic data visualization and analysis. This review provides starting points for accessing publicly available data and tools to support development of needed integrative methods.

[1]  Dario Strbenac,et al.  Savant Genome Browser 2: visualization and analysis for population-scale genomics , 2012, Nucleic Acids Res..

[2]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[3]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[4]  Mark A. Rubin,et al.  Health: Make precision medicine work for cancer care , 2015, Nature.

[5]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[6]  Héctor Corrada Bravo,et al.  Epiviz: interactive visual analytics for functional genomics data , 2014, Nature Methods.

[7]  Jing Zhu,et al.  Edge-based scoring and searching method for identifying condition-responsive protein-protein interaction sub-network , 2007, Bioinform..

[8]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[9]  Peng Qiu,et al.  TCGA-Assembler: open-source software for retrieving and processing TCGA data , 2014, Nature Methods.

[10]  K. Kohn,et al.  CellMiner: a web-based suite of genomic and pharmacologic tools to explore transcript and drug patterns in the NCI-60 cell line set. , 2012, Cancer research.

[11]  J. Weinstein,et al.  High Resolution Copy Number Variation Data in the NCI-60 Cancer Cell Lines from Whole Genome Microarrays Accessible through CellMiner , 2014, PloS one.

[12]  Philip Cayting,et al.  An encyclopedia of mouse DNA elements (Mouse ENCODE) , 2012, Genome Biology.

[13]  Audrey Kauffmann,et al.  Importing ArrayExpress datasets into R/Bioconductor , 2009, Bioinform..

[14]  Sampsa Hautaniemi,et al.  CNAmet: an R package for integrating copy number, methylation and expression data , 2011, Bioinform..

[15]  Jill P. Mesirov,et al.  Cancer Vulnerabilities Unveiled by Genomic Loss , 2012, Cell.

[16]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[17]  Elizabeth Pennisi Genomics. New database links regulatory DNA to its target genes. , 2015, Science.

[18]  Yidong Chen,et al.  GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus , 2008, Bioinform..

[19]  Hanspeter Pfister,et al.  Characterizing Cancer Subtypes Using Dual Analysis in Caleydo StratomeX , 2014, IEEE Computer Graphics and Applications.

[20]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[21]  J. Mesirov,et al.  Systematic investigation of genetic vulnerabilities across cancer cell lines reveals lineage-specific dependencies in ovarian cancer , 2011, Proceedings of the National Academy of Sciences.

[22]  Nuria Lopez-Bigas,et al.  IntOGen: integration and data mining of multidimensional oncogenomic data , 2010, Nature Methods.

[23]  Helen E. Parkinson,et al.  ArrayExpress—a public database of microarray experiments and gene expression profiles , 2006, Nucleic Acids Res..

[24]  Nuria Lopez-Bigas,et al.  Gitools: Analysis and Visualisation of Genomic Data Using Interactive Heat-Maps , 2011, PloS one.

[25]  P. Flicek,et al.  The Ensembl Regulatory Build , 2015, Genome Biology.

[26]  J. Rioux,et al.  Autoimmune diseases: insights from genome-wide association studies. , 2008, Human molecular genetics.

[27]  Peter J. Bickel,et al.  Comparative analysis of regulatory information and circuits across distant species , 2014, Nature.

[28]  Andrew D. Rouillard,et al.  LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures , 2014, Nucleic Acids Res..

[29]  Rainu Kaushal,et al.  Changing the research landscape: the New York City Clinical Data Research Network , 2014, J. Am. Medical Informatics Assoc..

[30]  Shane J. Neph,et al.  An expansive human regulatory lexicon encoded in transcription factor footprints , 2012, Nature.

[31]  Jing Zhu,et al.  Edge-based scoring and searching method for identifying condition-responsive protein-protein interaction sub-network , 2007, Bioinform..

[32]  Joshua S. Kaminker,et al.  A resource for cell line authentication, annotation and quality control , 2015, Nature.

[33]  T. Meehan,et al.  An atlas of active enhancers across human cell types and tissues , 2014, Nature.

[34]  Brian Craft,et al.  The Cancer Genomics Hub (CGHub): overcoming cancer through the power of torrential data , 2014, Database J. Biol. Databases Curation.

[35]  Clyde Hertzman,et al.  Birth weight, childhood socioeconomic environment, and cognitive development in the 1958 British birth cohort study , 2002, BMJ : British Medical Journal.

[36]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[37]  T. Golub,et al.  A method for high-throughput gene expression signature analysis , 2006, Genome Biology.

[38]  M. Sadelain,et al.  Abstract 3499: CD56 targeted chimeric antigen receptors for immunotherapy of multiple myeloma , 2012 .

[39]  Cesare Furlanello,et al.  A promoter-level mammalian expression atlas , 2015 .

[40]  E. Petretto,et al.  Genome-wide identification of Ikaros targets elucidates its contribution to mouse B-cell lineage specification and pre-B-cell differentiation. , 2013, Blood.

[41]  Arcadi Navarro,et al.  The European Genome-phenome Archive of human data consented for biomedical research , 2015, Nature Genetics.

[42]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[43]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[44]  Michael P. Schroeder,et al.  In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals targeting opportunities. , 2015, Cancer cell.

[45]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[46]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[47]  Adam A. Margolin,et al.  Assessing the clinical utility of cancer genomic and proteomic data across tumor types , 2014, Nature Biotechnology.

[48]  E. Hyppönen,et al.  Vitamin D Status and Glucose Homeostasis in the 1958 British Birth Cohort , 2006, Diabetes Care.

[49]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[50]  Giovanni Parmigiani,et al.  Integrating diverse genomic data using gene sets , 2011, Genome Biology.

[51]  Eytan Ruppin,et al.  Predicting Cancer-Specific Vulnerability via Data-Driven Detection of Synthetic Lethality , 2014, Cell.

[52]  Patrick Neven,et al.  Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer , 2015 .

[53]  Jeffrey R. Whiteaker,et al.  Proteogenomic characterization of human colon and rectal cancer , 2014, Nature.

[54]  Sijian Wang,et al.  SPARSE INTEGRATIVE CLUSTERING OF MULTIPLE OMICS DATA SETS. , 2013, The annals of applied statistics.

[55]  Benjamin E. Gross,et al.  Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal , 2013, Science Signaling.

[56]  Thomas J. Ha,et al.  Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells , 2015, Science.

[57]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[58]  Ellen T. Gelfand,et al.  Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies , 2014, Scientific Data.

[59]  William C Reinhold,et al.  Integrating data on DNA copy number with gene expression levels and drug sensitivities in the NCI-60 cell line panel , 2006, Molecular Cancer Therapeutics.

[60]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[61]  Weiqing Wang,et al.  Prediction of individualized therapeutic vulnerabilities in cancer from genomic profiles , 2014, Bioinform..

[62]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[63]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[64]  Kim-Anh Lê Cao,et al.  A novel approach for biomarker selection and the integration of repeated measures experiments from two assays , 2012, BMC bioinformatics.

[65]  Nuria Lopez-Bigas,et al.  Visualizing multidimensional cancer genomics data , 2013, Genome Medicine.

[66]  L. Garraway,et al.  Clinical implications of the cancer genome. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[67]  Benjamin Haibe-Kains,et al.  curatedOvarianData: clinically annotated data for the ovarian cancer transcriptome , 2013, Database J. Biol. Databases Curation.

[68]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[69]  Derek W Wright,et al.  Gateways to the FANTOM5 promoter level mammalian expression atlas , 2015, Genome Biology.

[70]  Scott A. Rifkin,et al.  Revealing the architecture of gene regulation: the promise of eQTL studies. , 2008, Trends in genetics : TIG.

[71]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[72]  Mary Goldman,et al.  The UCSC Cancer Genomics Browser: update 2015 , 2014, Nucleic Acids Res..

[73]  R. Shoemaker The NCI60 human tumour cell line anticancer drug screen , 2006, Nature Reviews Cancer.

[74]  Judice L. Y. Koh,et al.  COLT-Cancer: functional genetic screening resource for essential genes in human cancer cell lines , 2011, Nucleic Acids Res..

[75]  Nathan C. Sheffield,et al.  The accessible chromatin landscape of the human genome , 2012, Nature.

[76]  Sean R. Davis,et al.  GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor , 2007, Bioinform..

[77]  V. Frouin,et al.  Variable selection for generalized canonical correlation analysis. , 2014, Biostatistics.

[78]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[79]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[80]  Katalin Susztak,et al.  Understanding the epigenetic syntax for the genetic alphabet in the kidney. , 2014, Journal of the American Society of Nephrology : JASN.

[81]  Ellen T. Gelfand,et al.  The Genotype-Tissue Expression (GTEx) project , 2013, Nature Genetics.

[82]  David Gomez-Cabrero,et al.  Data integration in the era of omics: current and future challenges , 2014, BMC Systems Biology.

[83]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[84]  Ernest Fraenkel,et al.  Analysis of in vitro insulin-resistance models and their physiological relevance to in vivo diet-induced adipose insulin resistance. , 2013, Cell reports.

[85]  Stefan Decker,et al.  Linked cancer genome atlas database , 2013, I-SEMANTICS '13.

[86]  Guy Perrière,et al.  Cross-platform comparison and visualisation of gene expression data using co-inertia analysis , 2003, BMC Bioinformatics.

[87]  M. Samur RTCGAToolbox: A New Tool for Exporting TCGA Firehose Data , 2014, PloS one.

[88]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[89]  T J Cole,et al.  Who changes body mass between adolescence and adulthood? Factors predicting change in BMI between 16 year and 30 years in the 1970 British Birth Cohort , 2006, International Journal of Obesity.

[90]  Aedín C. Culhane,et al.  A multivariate approach to the integration of multi-omics datasets , 2014, BMC Bioinformatics.

[91]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[92]  Raphael Gottardo,et al.  Orchestrating high-throughput genomic analysis with Bioconductor , 2015, Nature Methods.

[93]  Joshua C. Gilbert,et al.  An Interactive Resource to Identify Cancer Genetic and Lineage Dependencies Targeted by Small Molecules , 2013, Cell.

[94]  Roland Eils,et al.  circlize implements and enhances circular visualization in R , 2014, Bioinform..

[95]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.