Why Is There a Lack of Consensus on Molecular Subgroups of Glioblastoma? Understanding the Nature of Biological and Statistical Variability in Glioblastoma Expression Data

Introduction Gene expression patterns characterizing clinically-relevant molecular subgroups of glioblastoma are difficult to reproduce. We suspect a combination of biological and analytic factors confounds interpretation of glioblastoma expression data. We seek to clarify the nature and relative contributions of these factors, to focus additional investigations, and to improve the accuracy and consistency of translational glioblastoma analyses. Methods We analyzed gene expression and clinical data for 340 glioblastomas in The Cancer Genome Atlas (TCGA). We developed a logic model to analyze potential sources of biological, technical, and analytic variability and used standard linear classifiers and linear dimensional reduction algorithms to investigate the nature and relative contributions of each factor. Results Commonly-described sources of classification error, including individual sample characteristics, batch effects, and analytic and technical noise make measurable but proportionally minor contributions to inconsistent molecular classification. Our analysis suggests that three, previously underappreciated factors may account for a larger fraction of classification errors: inherent non-linear/non-orthogonal relationships among the genes used in conjunction with classification algorithms that assume linearity; skewed data distributions assumed to be Gaussian; and biologic variability (noise) among tumors, of which we propose three types. Conclusions Our analysis of the TCGA data demonstrates a contributory role for technical factors in molecular classification inconsistencies in glioblastoma but also suggests that biological variability, abnormal data distribution, and non-linear relationships among genes may be responsible for a proportionally larger component of classification error. These findings may have important implications for both glioblastoma research and for translational application of other large-volume biological databases.

[1]  Robert Weil,et al.  Genomic expression patterns distinguish long-term from short-term glioblastoma survivors: a preliminary feasibility study. , 2008, Genomics.

[2]  Thomas D. Wu,et al.  Molecular subclasses of high-grade glioma predict prognosis, delineate a pattern of disease progression, and resemble stages in neurogenesis. , 2006, Cancer cell.

[3]  J. Ioannidis Microarrays and molecular research: noise discovery? , 2005, The Lancet.

[4]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[5]  Douglas A. Hosack,et al.  Identifying biological themes within lists of genes with EASE , 2003, Genome Biology.

[6]  Sharad Goel,et al.  HORSESHOES IN MULTIDIMENSIONAL SCALING AND LOCAL KERNEL METHODS , 2008, 0811.1477.

[7]  Nicholas F. Marko,et al.  Integrated molecular analysis suggests a three-class model for low-grade gliomas: a proof-of-concept study. , 2010, Genomics.

[8]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[9]  V. Levin,et al.  High-Grade Gliomas: Diagnosis and Treatment , 2007 .

[10]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[11]  R. Mirimanoff,et al.  Radiotherapy plus concomitant and adjuvant temozolomide for glioblastoma. , 2005, The New England journal of medicine.

[12]  Steven C. Lawlor,et al.  GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways , 2002, Nature Genetics.

[13]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[14]  R. McIndoe,et al.  Microarray experimental design: power and sample size considerations. , 2003, Physiological genomics.

[15]  Yudi Pawitan,et al.  False discovery rate, sensitivity and sample size for microarray studies , 2005, Bioinform..

[16]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[17]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Shuying S Li,et al.  FDR‐controlling testing procedures and sample size determination for microarrays , 2005, Statistics in medicine.

[19]  M. West,et al.  Gene expression profiling and genetic markers in glioblastoma survival. , 2005, Cancer research.

[20]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[21]  Sandra Mathison,et al.  Encyclopedia of Evaluation , 2004 .

[22]  Roger E Bumgarner,et al.  Sample size for detecting differentially expressed genes in microarray experiments , 2004, BMC Genomics.

[23]  N. Socci,et al.  Leptin-specific patterns of gene expression in white adipose tissue. , 2000, Genes & development.

[24]  H. Bartsch,et al.  International Agency for Research on Cancer. , 1969, WHO chronicle.

[25]  Tao Jiang,et al.  A Systems Biology-Based Gene Expression Classifier of Glioblastoma Predicts Survival with Solid Tumors , 2009, PloS one.

[26]  Weida Tong,et al.  Reproducible and reliable microarray results through quality control: good laboratory proficiency and appropriate data analysis practices are essential. , 2008, Current opinion in biotechnology.

[27]  Robert Tibshirani,et al.  SAM “Significance Analysis of Microarrays” Users guide and technical document , 2002 .

[28]  Gene H. Barnett High-Grade Gliomas , 2007 .

[29]  Gonzalo R. Arce,et al.  NONLINEAR CORRELATION FOR THE ANALYSIS OF GENE EXPRESSION DATA , 2002 .

[30]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[31]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[32]  Dennis B. Troup,et al.  NCBI GEO: mining millions of expression profiles—database and tools , 2004, Nucleic Acids Res..

[33]  Alexander R. Pico,et al.  GenMAPP 2: new features and resources for pathway analysis , 2007, BMC Bioinformatics.

[34]  Weida Tong,et al.  QA/QC: challenges and pitfalls facing the microarray community and regulatory agencies , 2004, Expert review of molecular diagnostics.

[35]  János Podani,et al.  RESEMBLANCE COEFFICIENTS AND THE HORSESHOE EFFECT IN PRINCIPAL COORDINATES ANALYSIS , 2002 .

[36]  Michael Recce,et al.  Noise filtering and nonparametric analysis of microarray data underscores discriminating markers of oral, prostate, lung, ovarian and breast cancer , 2004, BMC Bioinformatics.

[37]  Stat Pairs,et al.  Statistical Algorithms Description Document , 2022 .

[38]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[39]  Weida Tong,et al.  Investigation of reproducibility of differentially expressed genes in DNA microarrays through statistical simulation , 2009, BMC proceedings.

[40]  R. Mirimanoff,et al.  Neoadjuvant chemotherapy and radiotherapy followed by surgery in selected patients with stage IIIB non-small-cell lung cancer: a multicentre phase II trial. , 2009, The Lancet. Oncology.

[41]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[42]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[43]  Hongyu Zhao,et al.  Practical guidelines for assessing power and false discovery rate for a fixed sample size in microarray experiments , 2008, Statistics in medicine.

[44]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[45]  Ben Shneiderman,et al.  Interactively optimizing signal-to-noise ratios in expression profiling: project-specific algorithm selection and detection p-value weighting in Affymetrix microarrays , 2004, Bioinform..

[46]  Webster K. Cavenee,et al.  Pathology and genetics of tumours of the nervous system. , 2000 .

[47]  J. Stockman,et al.  A Network Model of a Cooperative Genetic Landscape in Brain Tumors , 2011 .

[48]  A. Scherer Batch Effects and Noise in Microarray Experiments , 2009 .

[49]  T. Hampton,et al.  The Cancer Genome Atlas , 2020, Indian Journal of Medical and Paediatric Oncology.

[50]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[51]  W. Liang,et al.  9) TM4 Microarray Software Suite , 2006 .

[52]  James J. Chen,et al.  Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data , 2007, BMC Bioinformatics.

[53]  Weida Tong,et al.  QA/QC issues to aid regulatory acceptance of microarray gene expression data , 2007, Environmental and molecular mutagenesis.

[54]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[55]  C. Brennan,et al.  Glioblastoma Subclasses Can Be Defined by Activity among Signal Transduction Pathways and Associated Genomic Alterations , 2009, PloS one.

[56]  Z L Gokaslan,et al.  A multivariate analysis of 416 patients with glioblastoma multiforme: prognosis, extent of resection, and survival. , 2001, Journal of neurosurgery.

[57]  K. Aldape,et al.  A multigene predictor of outcome in glioblastoma. , 2010, Neuro-oncology.

[58]  E. Holland,et al.  Glioblastoma multiforme: the terminator. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[59]  Y. Tu,et al.  Quantitative noise analysis for gene expression microarray experiments , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[60]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[61]  Johanna Hardin,et al.  A note on oligonucleotide expression values not being normally distributed. , 2009, Biostatistics.

[62]  W. Liang,et al.  TM4 microarray software suite. , 2006, Methods in enzymology.

[63]  Kimberly F. Johnson,et al.  QA/QC as a pressing need for microarray analysis: meeting report from CAMDA'02. , 2003, BioTechniques.

[64]  Eliot Marshall,et al.  Getting the Noise Out of Gene Arrays , 2004, Science.

[65]  James J. Chen,et al.  Power and sample size estimation in microarray studies , 2010, BMC Bioinformatics.

[66]  S. Horvath,et al.  Gene Expression Profiling of Gliomas Strongly Predicts Survival , 2004, Cancer Research.

[67]  Johanna S. Hardin,et al.  Estimating phenotypic correlations: correcting for bias due to intraindividual variability , 2007 .

[68]  P. Collins,et al.  Performance comparison of one-color and two-color platforms within the Microarray Quality Control (MAQC) project , 2006, Nature Biotechnology.

[69]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[70]  David B. Allison,et al.  The PowerAtlas: a power and sample size atlas for microarray experimental design and research , 2006, BMC Bioinformatics.

[71]  Lindsay I. Smith,et al.  A tutorial on Principal Components Analysis , 2002 .

[72]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[73]  Ron O. Dror,et al.  Noise Models in Gene Array Analysis , 2001 .

[74]  John Quackenbush Microarray analysis and tumor classification. , 2006, The New England journal of medicine.

[75]  R. Mccomb,et al.  Pathology and Genetics of Tumours of the Nervous System , 1998 .

[76]  R. Baron,et al.  Finding genes in the C2C12 osteogenic pathway by k-nearest-neighbor classification of expression data. , 2002, Genome research.

[77]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..