A multivariate approach for integrating genome-wide expression data and biological knowledge

MOTIVATION Several statistical methods that combine analysis of differential gene expression with biological knowledge databases have been proposed for a more rapid interpretation of expression data. However, most such methods are based on a series of univariate statistical tests and do not properly account for the complex structure of gene interactions. RESULTS We present a simple yet effective multivariate statistical procedure for assessing the correlation between a subspace defined by a group of genes and a binary phenotype. A subspace is deemed significant if the samples corresponding to different phenotypes are well separated in that subspace. The separation is measured using Hotelling's T(2) statistic, which captures the covariance structure of the subspace. When the dimension of the subspace is larger than that of the sample space, we project the original data to a smaller orthonormal subspace. We use this method to search through functional pathway subspaces defined by Reactome, KEGG, BioCarta and Gene Ontology. To demonstrate its performance, we apply this method to the data from two published studies, and visualize the results in the principal component space.

[1]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[2]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[3]  Aniko Szabo,et al.  Multivariate exploratory tools for microarray data analysis. , 2003, Biostatistics.

[4]  M. Bjornsti,et al.  The tor pathway: a target for cancer therapy , 2004, Nature Reviews Cancer.

[5]  Inyoung Kim,et al.  Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer , 2005, Bioinform..

[6]  Peng Xiao,et al.  Hotelling's T2 multivariate profiling for detecting differential expression in microarrays , 2005, Bioinform..

[7]  Cheng Cheng,et al.  Gene-expression patterns in drug-resistant acute lymphoblastic leukemia cells and response to treatment. , 2004, The New England journal of medicine.

[8]  Martin Vingron,et al.  An Improved Statistic for Detecting Over-Represented Gene Ontology Annotations in Gene Sets , 2006, RECOMB.

[9]  Douglas L Mann,et al.  Stress-activated cytokines and the heart: from adaptation to maladaptation. , 2003, Annual review of physiology.

[10]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Ashwin Srinivasan,et al.  The Predictive Toxicology Challenge 2000-2001 , 2001, Bioinform..

[12]  T. Golub,et al.  A Mechanism of Cyclin D1 Action Encoded in the Patterns of Gene Expression in Human Cancer , 2003, Cell.

[13]  Hiroshi Asanuma,et al.  Cardiac hypertrophy is inhibited by antagonism of ADAM12 processing of HB-EGF: Metalloproteinase inhibitors as a new therapy , 2002, Nature Medicine.

[14]  Xinqiang Han,et al.  Genomic profiling of the human heart before and after mechanical support with a ventricular assist device reveals alterations in vascular signaling networks. , 2004, Physiological genomics.

[15]  Martin Vingron,et al.  Identifying splits with clear separation: a new class discovery method for gene expression data , 2001, ISMB.

[16]  U. Mansmann,et al.  Testing Differential Gene Expression in Functional Groups , 2005, Methods of Information in Medicine.

[17]  D. Koller,et al.  A module map showing conditional activity of expression modules in cancer , 2004, Nature Genetics.

[18]  J. Molkentin,et al.  Redefining the roles of p38 and JNK signaling in cardiac hypertrophy: dichotomy between cultured myocytes and animal models. , 2003, Journal of molecular and cellular cardiology.

[19]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  V. Arango,et al.  Using the Gene Ontology for Microarray Data Mining: A Comparison of Methods and Application to Age Effects in Human Prefrontal Cortex , 2004, Neurochemical Research.

[21]  T. Golub,et al.  mTOR inhibition reverses Akt-dependent prostate intraepithelial neoplasia through regulation of apoptotic and HIF-1-dependent pathways , 2004, Nature Medicine.

[22]  U. Mansmann Genomic profiling. Interplay between clinical epidemiology, bioinformatics and biostatistics. , 2005 .

[23]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[24]  Lu Tian,et al.  Linking gene expression data with patient survival times using partial least squares , 2002, ISMB.

[25]  J. Molkentin,et al.  Is nuclear factor kappaB an attractive therapeutic target for treating cardiac hypertrophy? , 2003, Circulation.

[26]  Steven C. Lawlor,et al.  MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data , 2003, Genome Biology.

[27]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[28]  S. Schreiber,et al.  Vector algebra in the analysis of genome-wide expression data , 2002, Genome Biology.

[29]  R. Hajjar,et al.  Differential Activation of Signal Transduction Pathways in Human Hearts With Hypertrophy Versus Advanced Heart Failure , 2001, Circulation.

[30]  E. Birney,et al.  Reactome: a knowledgebase of biological pathways , 2004, Nucleic Acids Research.

[31]  Erik M. van Mulligen,et al.  Ambiguity of Human Gene Symbols in LocusLink and MEDLINE: Creating an Inventory and a Disambiguation Test Collection , 2003, AMIA.

[32]  Guy Perrière,et al.  Between-group analysis of microarray data , 2002, Bioinform..

[33]  Thomas Lengauer,et al.  Improved scoring of functional groups from gene expression data by decorrelating GO graph structure , 2006, Bioinform..

[34]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Stanley N Cohen,et al.  Effects of threshold choice on biological conclusions reached during analysis of gene expression by DNA microarrays. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Ruiliang Pu,et al.  Penalized discriminant analysis of in situ hyperspectral data for conifer species recognition , 1999, IEEE Trans. Geosci. Remote. Sens..

[38]  Mayer Aladjem,et al.  Regularized discriminant analysis for face recognition , 2004, Pattern Recognit..

[39]  Masatsugu Hori,et al.  Heparin-binding EGF-like growth factor and ErbB signaling is essential for heart function , 2003, Proceedings of the National Academy of Sciences of the United States of America.