Inferring Pathway Activity toward Precise Disease Classification

The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in complex diseases due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than on the expression levels of individual genes or proteins. We propose a new classification method based on pathway activities inferred for each patient. For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. We show that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer. Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways. Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease.

[1]  J A Swets,et al.  Psychological Science Can Improve Diagnostic Decisions , 2000, Psychological science in the public interest : a journal of the American Psychological Society.

[2]  P. Khatri,et al.  Global functional profiling of gene expression. , 2003, Genomics.

[3]  M. Nguyen,et al.  Estrogen induces lung metastasis through a host compartment-specific response. , 2006, Cancer research.

[4]  Harry Vrieling,et al.  Analysis of Gene Expression Using Gene Sets Discriminates Cancer Patients with and without Late Radiation Toxicity , 2006, PLoS medicine.

[5]  Jean-Philippe Vert,et al.  Extracting active pathways from gene expression data , 2003, ECCB.

[6]  G. Glinsky,et al.  Microarray analysis identifies a death-from-cancer signature predicting therapy failure in patients with multiple types of cancer. , 2005, The Journal of clinical investigation.

[7]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[8]  YuanBo,et al.  Detecting functional modules in the yeast protein--protein interaction network , 2006 .

[9]  W. Symmans,et al.  Breast cancer heterogeneity: evaluation of clonality in primary and metastatic lesions. , 1995, Human pathology.

[10]  R. Gillies,et al.  Why do cancers have high aerobic glycolysis? , 2004, Nature Reviews Cancer.

[11]  R. Karp,et al.  From the Cover : Conserved patterns of protein interaction in multiple species , 2005 .

[12]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[13]  Robert J Gillies,et al.  Glycolysis in cancer: a potential target for therapy. , 2007, The international journal of biochemistry & cell biology.

[14]  William Stafford Noble,et al.  Exploring Gene Expression Data with Class Scores , 2001, Pacific Symposium on Biocomputing.

[15]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[17]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[18]  Steven C. Lawlor,et al.  MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data , 2003, Genome Biology.

[19]  S. Gambhir Molecular imaging of cancer with positron emission tomography , 2002, Nature Reviews Cancer.

[20]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[21]  K. Basso,et al.  A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas , 2008, Molecular systems biology.

[22]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[23]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[24]  Kenneth H. Buetow,et al.  Identification of Key Processes Underlying Cancer Phenotypes Using Biologic Pathway Analysis , 2007, PloS one.

[25]  E. Lander,et al.  A molecular signature of metastasis in primary solid tumors , 2003, Nature Genetics.

[26]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[27]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[28]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[29]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  V. Arango,et al.  Using the Gene Ontology for Microarray Data Mining: A Comparison of Methods and Application to Age Effects in Human Prefrontal Cortex , 2004, Neurochemical Research.

[31]  Allan R. Brasier,et al.  Identification of Direct Genomic Targets Downstream of the Nuclear Factor-κB Transcription Factor Mediating Tumor Necrosis Factor Signaling* , 2005, Journal of Biological Chemistry.

[32]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[33]  Carsten Peterson,et al.  Signal transduction pathway profiling of individual tumor samples , 2005, BMC Bioinformatics.

[34]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[35]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[36]  Qing Wang,et al.  Towards precise classification of cancers based on robust gene functional expression profiles , 2005, BMC Bioinformatics.

[37]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[38]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[39]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[40]  Jingchun Chen,et al.  Detecting functional modules in the yeast protein-protein interaction network , 2006, Bioinform..

[41]  R. Tibshirani,et al.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.