Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network

BackgroundFinding reliable gene markers for accurate disease classification is very challenging due to a number of reasons, including the small sample size of typical clinical data, high noise in gene expression measurements, and the heterogeneity across patients. In fact, gene markers identified in independent studies often do not coincide with each other, suggesting that many of the predicted markers may have no biological significance and may be simply artifacts of the analyzed dataset. To find more reliable and reproducible diagnostic markers, several studies proposed to analyze the gene expression data at the level of groups of functionally related genes, such as pathways. Studies have shown that pathway markers tend to be more robust and yield more accurate classification results. One practical problem of the pathway-based approach is the limited coverage of genes by currently known pathways. As a result, potentially important genes that play critical roles in cancer development may be excluded. To overcome this problem, we propose a novel method for identifying reliable subnetwork markers in a human protein-protein interaction (PPI) network.ResultsIn this method, we overlay the gene expression data with the PPI network and look for the most discriminative linear paths that consist of discriminative genes that are highly correlated to each other. The overlapping linear paths are then optimally combined into subnetworks that can potentially serve as effective diagnostic markers. We tested our method on two independent large-scale breast cancer datasets and compared the effectiveness and reproducibility of the identified subnetwork markers with gene-based and pathway-based markers. We also compared the proposed method with an existing subnetwork-based method.ConclusionsThe proposed method can efficiently find reliable subnetwork markers that outperform the gene-based and pathway-based markers in terms of discriminative power, reproducibility and classification performance. Subnetwork markers found by our method are highly enriched in common GO terms, and they can more accurately classify breast cancer metastasis compared to markers found by a previous method.

[1]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[2]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[3]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[4]  Jun Lu,et al.  Pathway level analysis of gene expression using singular value decomposition , 2005, BMC Bioinformatics.

[5]  Frederick P. Roth,et al.  Next generation software for functional trend analysis , 2009, Bioinform..

[6]  J. Tchinda,et al.  Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. , 2006, Science.

[7]  Qing Wang,et al.  Towards precise classification of cancers based on robust gene functional expression profiles , 2005, BMC Bioinformatics.

[8]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[9]  Byung-Jun Yoon,et al.  Identifying reliable subnetwork markers in protein-protein interaction network for classification of breast cancer metastasis , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  E. Dougherty,et al.  Accurate and Reliable Cancer Classification Based on Probabilistic Inference of Pathway Activity , 2009, PloS one.

[11]  Edward R. Dougherty,et al.  Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..

[12]  R. Tibshirani,et al.  Empirical bayes methods and false discovery rates for microarrays , 2002, Genetic epidemiology.

[13]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[14]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[15]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Trey Ideker,et al.  CellCircuits: a database of protein network models , 2006, Nucleic Acids Res..

[17]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[18]  BMC Bioinformatics , 2005 .

[19]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[20]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  W. Symmans,et al.  Breast cancer heterogeneity: evaluation of clonality in primary and metastatic lesions. , 1995, Human pathology.

[22]  T. Kepler,et al.  Normalization and analysis of DNA microarray data by self-consistency and local regression , 2002, Genome Biology.

[23]  Trey Ideker,et al.  Testing for Differentially-Expressed Genes by Maximum-Likelihood Analysis of Microarray Data , 2000, J. Comput. Biol..

[24]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[25]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[26]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[27]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[29]  Doheon Lee,et al.  Inferring Pathway Activity toward Precise Disease Classification , 2008, PLoS Comput. Biol..

[30]  E. Lander,et al.  A molecular signature of metastasis in primary solid tumors , 2003, Nature Genetics.