Predictive response-relevant clustering of expression data provides insights into disease processes

This article describes and illustrates a novel method of microarray data analysis that couples model-based clustering and binary classification to form clusters of `response-relevant' genes; that is, genes that are informative when discriminating between the different values of the response. Predictions are subsequently made using an appropriate statistical summary of each gene cluster, which we call the `meta-covariate' representation of the cluster, in a probit regression model. We first illustrate this method by analysing a leukaemia expression dataset, before focusing closely on the meta-covariate analysis of a renal gene expression dataset in a rat model of salt-sensitive hypertension. We explore the biological insights provided by our analysis of these data. In particular, we identify a highly influential cluster of 13 genes—including three transcription factors (Arntl, Bhlhe41 and Npas2)—that is implicated as being protective against hypertension in response to increased dietary sodium. Functional and canonical pathway analysis of this cluster using Ingenuity Pathway Analysis implicated transcriptional activation and circadian rhythm signalling, respectively. Although we illustrate our method using only expression data, the method is applicable to any high-dimensional datasets. Expression data are available at ArrayExpress (accession number E-MEXP-2514) and code is available at http://www.dcs.gla.ac.uk/inference/metacovariateanalysis/.

[1]  Stephen French,et al.  Essential haematology , 2004, BMJ.

[2]  Rainer Breitling,et al.  Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments , 2004, FEBS letters.

[3]  G. Tsujimoto,et al.  Salt-sensitive hypertension in circadian clock–deficient Cry-null mice involves dysregulated adrenal Hsd3b6 , 2010, Nature Medicine.

[4]  R. Dodel,et al.  Role of MIF in Inflammation and Tumorigenesis , 2008, Oncology.

[5]  A. Dominiczak,et al.  Applicability of a "speed" congenic strategy to dissect blood pressure quantitative trait loci on rat chromosome 2. , 2000, Hypertension.

[6]  L. Leng,et al.  Macrophage Migration Inhibitory Factor Induces B Cell Survival by Activation of a CD74-CD44 Receptor Complex* , 2008, Journal of Biological Chemistry.

[7]  M. Bihoreau,et al.  Quantitative trait loci in genetically hypertensive rats. Possible sex specificity. , 1996, Hypertension.

[8]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[9]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[10]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  L. Leng,et al.  Macrophage Migration Inhibitory Factor Induces B Cell Survival by Activation of a CD 74-CD 44 Receptor Complex * , 2008 .

[13]  S. Pradervand,et al.  Molecular clock is involved in predictive circadian adjustment of renal function , 2009, Proceedings of the National Academy of Sciences.

[14]  M. Gaasenbeek,et al.  Candidate Genes That Determine Response to Salt in the Stroke-Prone Spontaneously Hypertensive Rat: Congenic Analysis , 2007, Hypertension.

[15]  A. Glas,et al.  Gene expression profiling: decoding breast cancer. , 2009, Surgical oncology.

[16]  Trevor Hastie,et al.  Averaged gene expressions for regression. , 2007, Biostatistics.

[17]  Mark A. Girolami,et al.  Inferring Meta-covariates in Classification , 2009, PRIB.

[18]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[19]  K. Reynolds,et al.  Global burden of hypertension: analysis of worldwide data , 2005, The Lancet.

[20]  Blaise Hanczar,et al.  Improving classification of microarray data using prototype-based feature selection , 2003, SKDD.

[21]  Bani K. Mallick,et al.  Gene selection using a two-level hierarchical Bayesian model , 2004, Bioinform..

[22]  H. Gralnick,et al.  Proposals for the Classification of the Acute Leukaemias French‐American‐British (FAB) Co‐operative Group , 1976, British journal of haematology.

[23]  G. Basso,et al.  The MIF-173G/C polymorphism does not contribute to prednisone poor response in vivo in childhood acute lymphoblastic leukemia , 2005, Leukemia.

[24]  R. Brennan,et al.  The Basic Helix-Loop-Helix Domain of the Aryl Hydrocarbon Receptor Nuclear Transporter (ARNT) Can Oligomerize and Bind E-box DNA Specifically* , 2001, The Journal of Biological Chemistry.

[25]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[26]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[28]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[29]  D. Goldenberg,et al.  CD74: A New Candidate Target for the Immunotherapy of B-Cell Neoplasms , 2007, Clinical Cancer Research.

[30]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Christian Delles,et al.  Functional genomics in hypertension , 2006, Current opinion in nephrology and hypertension.

[32]  D. Gauguier Polymorphisms in aryl hydrocarbon receptor nuclear translocator-like (Arntl, Bmal1) are associated with Type 2 diabetes and hypertension , 2007 .

[33]  John B. Hogenesch,et al.  Mop3 Is an Essential Component of the Master Circadian Pacemaker in Mammals , 2000, Cell.

[34]  M. Weinberger,et al.  Salt sensitivity of blood pressure in humans. , 1996, Hypertension.

[35]  K. Yagita,et al.  Alterations of Circadian Expressions of Clock Genes in Dahl Salt-Sensitive Rats Fed a High-Salt Diet , 2003, Hypertension.

[36]  M. Bihoreau,et al.  Aryl hydrocarbon receptor nuclear translocator-like (BMAL1) is associated with susceptibility to hypertension and type 2 diabetes , 2007, Proceedings of the National Academy of Sciences.

[37]  R. Bucala,et al.  Regulation of the CTL Response by Macrophage Migration Inhibitory Factor , 2001, The Journal of Immunology.

[38]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .