Quantitative comparison of microarray experiments with published leukemia related gene expression signatures

BackgroundMultiple gene expression signatures derived from microarray experiments have been published in the field of leukemia research. A comparison of these signatures with results from new experiments is useful for verification as well as for interpretation of the results obtained. Currently, the percentage of overlapping genes is frequently used to compare published gene signatures against a signature derived from a new experiment. However, it has been shown that the percentage of overlapping genes is of limited use for comparing two experiments due to the variability of gene signatures caused by different array platforms or assay-specific influencing parameters. Here, we present a robust approach for a systematic and quantitative comparison of published gene expression signatures with an exemplary query dataset.ResultsA database storing 138 leukemia-related published gene signatures was designed. Each gene signature was manually annotated with terms according to a leukemia-specific taxonomy. Two analysis steps are implemented to compare a new microarray dataset with the results from previous experiments stored and curated in the database. First, the global test method is applied to assess gene signatures and to constitute a ranking among them. In a subsequent analysis step, the focus is shifted from single gene signatures to chromosomal aberrations or molecular mutations as modeled in the taxonomy. Potentially interesting disease characteristics are detected based on the ranking of gene signatures associated with these aberrations stored in the database. Two example analyses are presented. An implementation of the approach is freely available as web-based application.ConclusionsThe presented approach helps researchers to systematically integrate the knowledge derived from numerous microarray experiments into the analysis of a new dataset. By means of example leukemia datasets we demonstrate that this approach detects related experiments as well as related molecular mutations and may help to interpret new microarray data.

[1]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[2]  R. Tibshirani,et al.  Statistical Applications in Genetics and Molecular Biology Pre-validation and inference in microarrays , 2011 .

[3]  Tao Han,et al.  Cross-platform comparability of microarray technology: Intra-platform consistency and appropriate data analysis procedures are essential , 2005, BMC Bioinformatics.

[4]  Qi Liu,et al.  BMC Bioinformatics BioMed Central Methodology article Comparative evaluation of gene-set analysis methods , 2007 .

[5]  David Elashoff,et al.  Relation between resistance of Philadelphia-chromosome-positive acute lymphoblastic leukaemia to the tyrosine kinase inhibitor STI571 and gene-expression profiles: a gene-expression study , 2002, The Lancet.

[6]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[7]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[8]  J. Downing,et al.  Pediatric acute myeloid leukemia with NPM1 mutations is characterized by a gene expression profile with dysregulated HOX gene expression distinct from MLL-rearranged leukemias , 2007, Leukemia.

[9]  Catalin C. Barbacioru,et al.  The balance of reproducibility, sensitivity, and specificity of lists of differentially expressed genes in microarray studies , 2008, BMC Bioinformatics.

[10]  Ryszard Maleszka,et al.  Microarray reality checks in the context of a complex disease , 2004, Nature Biotechnology.

[11]  Ulrich Mansmann,et al.  An 86-probe-set gene-expression signature predicts survival in cytogenetically normal acute myeloid leukemia. , 2008, Blood.

[12]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[13]  J. Downing,et al.  Gene Expression Profiling of Pediatric Acute Myelogenous Leukemia Materials and Methods , 2022 .

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[15]  Sandya Liyanarachchi,et al.  Acute myeloid leukemia with complex karyotypes and abnormal chromosome 21: Amplification discloses overexpression of APP, ETS2, and ERG genes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Hongzhe Li,et al.  Statistical Applications in Genetics and Molecular Biology An additive genetic gamma frailty model for two-locus linkage analysis using sibship age of onset data , 2011 .

[17]  W Hiddemann,et al.  Pediatric acute lymphoblastic leukemia (ALL) gene expression signatures classify an independent cohort of adult ALL patients , 2004, Leukemia.

[18]  M. Caligiuri,et al.  Expression profiling reveals fundamental biological differences in acute myeloid leukemia with isolated trisomy 8 and normal cytogenetics. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Zhiyuan Luo,et al.  Prospective gene expression analysis accurately subtypes acute leukaemia in children and establishes a commonality between hyperdiploidy and t(12;21) in acute lymphoblastic leukaemia , 2005, British journal of haematology.

[20]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[21]  A. Weiner,et al.  Software L 2 L : a simple tool for discovering the hidden significance in microarray expression data , 2005 .

[22]  Pankaj Agarwal,et al.  Gene Vector Analysis (Geneva): A unified method to detect differentially-regulated gene sets and similar microarray experiments , 2008, BMC Bioinformatics.

[23]  Korbinian Strimmer,et al.  BMC Bioinformatics BioMed Central Methodology article A general modular framework for gene set enrichment analysis , 2009 .

[24]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[25]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[26]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[27]  Michael A. Black,et al.  Microarray-based gene set analysis: a comparison of current methods , 2008, BMC Bioinformatics.

[28]  Charles Wang,et al.  Improving the power for detecting overlapping genes from multiple DNA microarray-derived gene lists , 2008, BMC Bioinformatics.

[29]  Misao Ohki,et al.  Identification of a gene expression signature associated with pediatric AML prognosis. , 2003, Blood.

[30]  J. Downing,et al.  Treatment-specific changes in gene expression discriminate in vivo drug response in human leukemia cells , 2003, Nature Genetics.

[31]  Jing Zhu,et al.  Apparently low reproducibility of true differential expression discoveries in microarray studies , 2008, Bioinform..

[32]  Anne-Laure Boulesteix,et al.  Stability and aggregation of ranked gene lists , 2009, Briefings Bioinform..

[33]  Seon-Young Kim,et al.  Gene-set approach for expression pattern analysis , 2008, Briefings Bioinform..

[34]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[35]  B. Löwenberg,et al.  A decade of genome-wide gene expression profiling in acute myeloid leukemia: flashback and prospects. , 2009, Blood.

[36]  Natalia Meani,et al.  Acute myeloid leukemia fusion proteins deregulate genes involved in stem cell maintenance and DNA repair. , 2003, The Journal of clinical investigation.

[37]  Roland Eils,et al.  Group testing for pathway analysis improves comparability of different microarray datasets , 2006, Bioinform..

[38]  Jesper Tegnér,et al.  On reliable discovery of molecular signatures , 2009, BMC Bioinformatics.

[39]  Liliana Florea,et al.  List of lists-annotated (LOLA): a database for annotation and comparison of published microarray gene lists. , 2005, Gene.

[40]  R. Verhaak,et al.  Prognostically useful gene-expression profiles in acute myeloid leukemia. , 2004, The New England journal of medicine.

[41]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[42]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[43]  K. Sikora,et al.  Leukemia , 1984, British Journal of Cancer.

[44]  James J. Chen,et al.  Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data , 2007, BMC Bioinformatics.

[45]  R. Lempicki,et al.  Evaluation of gene expression measurements from commercial microarray platforms. , 2003, Nucleic acids research.

[46]  Martin Dugas,et al.  Gene Expression Profiling in AML with Normal Karyotype: A Multicenter Study Investigating Molecular Markers in 252 Cases , 2008 .

[47]  R. Tibshirani,et al.  Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. , 2004, The New England journal of medicine.

[48]  L. Ein-Dor,et al.  Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Jelle J. Goeman,et al.  Testing association of a pathway with survival using gene expression data , 2005, Bioinform..

[50]  Sara van de Geer,et al.  Testing against a high dimensional alternative , 2006 .

[51]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..

[52]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[53]  Chris F. Taylor,et al.  The MGED Ontology: a resource for semantics-based description of microarray experiments , 2006, Bioinform..

[54]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[55]  M. Krzywinski,et al.  New insights to the MLL recombinome of acute leukemias , 2009, Leukemia.

[56]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[57]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[58]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[59]  T. Barrette,et al.  Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. , 2007, Neoplasia.

[60]  Patrick Cahan,et al.  Meta-analysis of microarray results: challenges, opportunities, and recommendations for standardization. , 2007, Gene.

[61]  Todd R Golub,et al.  Gene expression–based high-throughput screening(GE-HTS) and application to leukemia differentiation , 2004, Nature Genetics.

[62]  Bas J. Wouters,et al.  Prediction of molecular subtypes in acute myeloid leukemia based on gene expression profiling , 2009, Haematologica.

[63]  Thomas Bradley,et al.  MiMiR – an integrated platform for microarray data sharing, mining and analysis , 2008, BMC Bioinformatics.

[64]  R. Eils,et al.  Acute myeloid leukemias with reciprocal rearrangements can be distinguished by specific gene expression profiles , 2002, Proceedings of the National Academy of Sciences of the United States of America.