Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme

BackgroundCoordinated efforts to collect large-scale data sets provide a basis for systems level understanding of complex diseases. In order to translate these fragmented and heterogeneous data sets into knowledge and medical benefits, advanced computational methods for data analysis, integration and visualization are needed.MethodsWe introduce a novel data integration framework, Anduril, for translating fragmented large-scale data into testable predictions. The Anduril framework allows rapid integration of heterogeneous data with state-of-the-art computational methods and existing knowledge in bio-databases. Anduril automatically generates thorough summary reports and a website that shows the most relevant features of each gene at a glance, allows sorting of data based on different parameters, and provides direct links to more detailed data on genes, transcripts or genomic regions. Anduril is open-source; all methods and documentation are freely available.ResultsWe have integrated multidimensional molecular and clinical data from 338 subjects having glioblastoma multiforme, one of the deadliest and most poorly understood cancers, using Anduril. The central objective of our approach is to identify genetic loci and genes that have significant survival effect. Our results suggest several novel genetic alterations linked to glioblastoma multiforme progression and, more specifically, reveal Moesin as a novel glioblastoma multiforme-associated gene that has a strong survival effect and whose depletion in vitro significantly inhibited cell proliferation. All analysis results are available as a comprehensive website.ConclusionsOur results demonstrate that integrated analysis and visualization of multidimensional and heterogeneous data by Anduril enables drawing conclusions on functional consequences of large-scale molecular data. Many of the identified genetic loci and genes having significant survival effect have not been reported earlier in the context of glioblastoma multiforme. Thus, in addition to generally applicable novel methodology, our results provide several glioblastoma multiforme candidate genes for further studies.Anduril is available at http://csbi.ltdk.helsinki.fi/anduril/The glioblastoma multiforme analysis results are available at http://csbi.ltdk.helsinki.fi/anduril/tcga-gbm/

[1]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Jaime Prilusky,et al.  GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support , 1998, Bioinform..

[4]  M. Ringnér,et al.  Impact of DNA amplification on gene expression patterns in breast cancer. , 2002, Cancer research.

[5]  Jim des Rivières,et al.  Eclipse: A platform for integrating development tools , 2004, IBM Syst. J..

[6]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[7]  M. Wigler,et al.  Circular binary segmentation for the analysis of array-based DNA copy number data. , 2004, Biostatistics.

[8]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[9]  Jaakko Astola,et al.  A strategy for identifying putative causes of gene expression variation in human cancers , 2004, J. Frankl. Inst..

[10]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[11]  K. Kosik,et al.  MicroRNA-21 is an antiapoptotic factor in human glioblastoma cells. , 2005, Cancer research.

[12]  Wolfgang Huber,et al.  Analysis of cell-based RNAi screens , 2006, Genome Biology.

[13]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[14]  Steve Horvath,et al.  Breast Cancer Molecular Signatures as Determined by SAGE: Correlation with Lymph Node Status , 2007, Molecular Cancer Research.

[15]  L. Chin,et al.  Malignant astrocytic glioma: genetics, biology, and paths to treatment. , 2007, Genes & development.

[16]  Rafael A Irizarry,et al.  Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data. , 2006, Biostatistics.

[17]  Ralph Weissleder,et al.  MicroRNA-21 knockdown disrupts glioma growth in vivo and displays synergistic cytotoxicity with neural precursor cell delivered S-TRAIL in human gliomas. , 2007, Cancer research.

[18]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[19]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[20]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[21]  M. Bienz,et al.  Trabid, a new positive regulator of Wnt-induced transcription with preference for binding and cleaving K63-linked ubiquitin chains. , 2008, Genes & development.

[22]  E. Chiocca,et al.  Emerging functions of microRNAs in glioblastoma , 2009, Journal of Neuro-Oncology.

[23]  Jianmin Wu,et al.  Integrated network analysis platform for protein-protein interactions , 2009, Nature Methods.

[24]  C. Brennan,et al.  Glioblastoma Subclasses Can Be Defined by Activity among Signal Transduction Pathways and Associated Genomic Alterations , 2009, PloS one.

[25]  Andrew M. Jenkinson,et al.  Ensembl 2009 , 2008, Nucleic Acids Res..

[26]  J. Uhm Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2009 .

[27]  Robert Tibshirani,et al.  A network model of a cooperative genetic landscape in brain tumors. , 2009, JAMA.

[28]  Anna M. Krichevsky,et al.  miR-21: a small multi-faceted RNA , 2008, Journal of cellular and molecular medicine.

[29]  Sampsa Hautaniemi,et al.  Integrative platform to translate gene sets to networks , 2010, Bioinform..

[30]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[31]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[32]  C. López-Ginés,et al.  Primary glioblastomas with and without EGFR amplification: Relationship to genetic alterations and clinicopathological features , 2009, Neuropathology : official journal of the Japanese Society of Neuropathology.

[33]  Jonathan Crabtree,et al.  Ergatis: a web interface and scalable software system for bioinformatics workflows , 2010, Bioinform..

[34]  James Bailey,et al.  MIRAGAA - a methodology for finding coordinated effects of microRNA expression changes and genome aberrations in cancer , 2010, Bioinform..

[35]  C. Sander,et al.  Automated Network Analysis Identifies Core Pathways in Glioblastoma , 2010, PloS one.

[36]  J. Stockman,et al.  A Network Model of a Cooperative Genetic Landscape in Brain Tumors , 2011 .