Building pathway clusters from Random Forests classification using class votes

BackgroundRecent years have seen the development of various pathway-based methods for the analysis of microarray gene expression data. These approaches have the potential to bring biological insights into microarray studies. A variety of methods have been proposed to construct networks using gene expression data. Because individual pathways do not act in isolation, it is important to understand how different pathways coordinate to perform cellular functions. However, there are no published methods describing how to build pathway clusters that are closely related to traits of interest.ResultsWe propose to build pathway clusters from pathway-based classification methods. The proposed methods allow researchers to identify clusters of pathways sharing similar functions. These pathways may or may not share genes. As an illustration, our approach is applied to three human breast cancer microarray data sets. We found that our methods yielded consistent and interpretable results for these three data sets. We further investigated one of the pathway clusters found using PubMatrix. We found that informative genes in the pathway clusters do have more publications with keywords, like estrogen receptor, compared with informative genes in other top pathways. In addition, using the shortest path analysis in GeneGo's MetaCore and Human Protein Reference Database, we were able to identify the links which connect the pathways without shared genes within the pathway cluster.ConclusionOur proposed pathway clustering methods allow bioinformaticians and biologists to investigate how informative genes within pathways are related to each other and understand possible crosstalk between pathways in a cluster. Therefore, building pathway clusters may lead to a better understanding of molecular mechanisms affecting a trait of interest, and help generate further biological hypotheses from gene expression data.

[1]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[2]  A Haines,et al.  Identification of carcinoma cells in peripheral blood samples of patients with advanced breast carcinoma using RT-PCR amplification of CK7 and MUC1. , 2004, Breast.

[3]  Peter J. Park,et al.  A multivariate approach for integrating genome-wide expression data and biological knowledge , 2006, Bioinform..

[4]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[5]  Qi Liu,et al.  Improving gene set analysis of microarray data by SAM-GS , 2007, BMC Bioinformatics.

[6]  Steven C. Lawlor,et al.  GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways , 2002, Nature Genetics.

[7]  R. Espinosa,et al.  Amplification and overexpression of peroxisome proliferator-activated receptor binding protein (PBP/PPARBP) gene in breast cancer. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[9]  M. West,et al.  Gene expression predictors of breast cancer outcomes , 2003, The Lancet.

[10]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[11]  I. Ellis,et al.  A gene-expression signature to predict survival in breast cancer across independent data sets , 2007, Oncogene.

[12]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[13]  D. Aebersold,et al.  Hypoxia-inducible factor 1 alpha in high-risk breast cancer: an independent prognostic parameter? , 2004, Breast Cancer Research.

[14]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[15]  E. Seregni,et al.  Circulating tumour markers in breast cancer , 2004, European Journal of Nuclear Medicine and Molecular Imaging.

[16]  Nikolaos Kavantzas,et al.  Immunohistochemical evaluation of immune response in invasive ductal breast cancer of not-otherwise-specified type. , 2003, Breast.

[17]  George C Tseng,et al.  Tight Clustering: A Resampling‐Based Approach for Identifying Stable and Tight Patterns in Data , 2005, Biometrics.

[18]  P. V. van Diest,et al.  Levels of hypoxia‐inducible factor‐1α independently predict prognosis in patients with lymph node negative breast carcinoma , 2003, Cancer.

[19]  Hongyu Zhao,et al.  Pathway analysis using random forests classification and regression , 2006, Bioinform..

[20]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  P. Hall,et al.  An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[23]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[24]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[25]  I. Ellis,et al.  A consensus prognostic gene expression classifier for ER positive breast cancer , 2006, Genome Biology.

[26]  Arnold D K Hill,et al.  Differential recruitment of coregulator proteins steroid receptor coactivator-1 and silencing mediator for retinoid and thyroid receptors to the estrogen receptor-estrogen response element by beta-estradiol and 4-hydroxytamoxifen in human breast cancer. , 2004, The Journal of clinical endocrinology and metabolism.

[27]  Graham A Colditz,et al.  Risk factors for breast cancer according to estrogen and progesterone receptor status. , 2004, Journal of the National Cancer Institute.

[28]  G. Landes,et al.  Combining serial analysis of gene expression and array technologies to identify genes differentially expressed in breast cancer. , 1999, Cancer research.

[29]  E. Williamson,et al.  BRCA1 and FOXA1 proteins coregulate the expression of the cell cycle-dependent kinase inhibitor p27Kip1 , 2006, Oncogene.

[30]  Javier A Menendez,et al.  Targeting fatty acid synthase in breast and endometrial cancer: An alternative to selective estrogen receptor modulators? , 2006, Endocrinology.

[31]  Robert Barouki,et al.  BRCA1 Modulates Xenobiotic Stress-inducible Gene Expression by Interacting with ARNT in Human Breast Cancer Cells* , 2006, Journal of Biological Chemistry.

[32]  Hanoch Kashtan,et al.  Detection of hepatocyte growth factor/scatter factor receptor (c-Met) and MUC1 from the axillary fluid drainage in patients after breast cancer surgery. , 2003, The Israel Medical Association journal : IMAJ.

[33]  D. Berry,et al.  Estrogen-receptor status and outcomes of modern chemotherapy for patients with node-positive breast cancer. , 2006, JAMA.

[34]  M. Morrow,et al.  Expression of Epithelial Mucins MUC1, MUC2, and MUC3 in Ductal Carcinoma In Situ of the Breast , 2001, The breast journal.

[35]  Tiffani J. Bright,et al.  PubMatrix: a tool for multiplex literature mining , 2003, BMC Bioinformatics.