Detecting disease associated modules and prioritizing active genes based on high throughput data

BackgroundThe accumulation of high-throughput data greatly promotes computational investigation of gene function in the context of complex biological systems. However, a biological function is not simply controlled by an individual gene since genes function in a cooperative manner to achieve biological processes. In the study of human diseases, rather than to discover disease related genes, identifying disease associated pathways and modules becomes an essential problem in the field of systems biology.ResultsIn this paper, we propose a novel method to detect disease related gene modules or dysfunctional pathways based on global characteristics of interactome coupled with gene expression data. Specifically, we exploit interacting relationships between genes to define a gene's active score function based on the kernel trick, which can represent nonlinear effects of gene cooperativity. Then, modules or pathways are inferred based on the active scores evaluated by the support vector regression in a global and integrative manner. The efficiency and robustness of the proposed method are comprehensively validated by using both simulated and real data with the comparison to existing methods.ConclusionsBy applying the proposed method to two cancer related problems, i.e. breast cancer and prostate cancer, we successfully identified active modules or dysfunctional pathways related to these two types of cancers with literature confirmed evidences. We show that this network-based method is highly efficient and can be applied to a large-scale problem especially for human disease related modules or pathway extraction. Moreover, this method can also be used for prioritizing genes associated with a specific phenotype or disease.

[1]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[2]  Erwin P. Gianchandani,et al.  Dynamic Analysis of Integrated Signaling, Metabolic, and Regulatory Networks , 2008, PLoS Comput. Biol..

[3]  Lothar Hennighausen,et al.  Information networks in the mammary gland , 2005, Nature Reviews Molecular Cell Biology.

[4]  Rainer Breitling,et al.  Graph-based iterative Group Analysis enhances microarray interpretation , 2004, BMC Bioinformatics.

[5]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[6]  Tobias Müller,et al.  Identifying functional modules in protein–protein interaction networks: an integrated exact approach , 2008, ISMB.

[7]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[8]  H. Wajant,et al.  The Fas Signaling Pathway: More Than a Paradigm , 2002, Science.

[9]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[10]  Ting Chen,et al.  Diffusion kernel-based logistic regression models for protein function prediction. , 2006, Omics : a journal of integrative biology.

[11]  K. Aihara,et al.  Uncovering signal transduction networks from high-throughput data by integer linear programming , 2008, Nucleic acids research.

[12]  R. Wolfram,et al.  Defective antigen presentation resulting from impaired expression of costimulatory molecules in breast cancer , 2000, International journal of cancer.

[13]  P. Thomas,et al.  Identification of membrane progestin receptors in human breast cancer cell lines and biopsies and their potential involvement in breast cancer , 2007, Steroids.

[14]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.

[16]  Maricel G. Kann,et al.  Protein interactions and disease: computational approaches to uncover the etiology of diseases , 2007, Briefings Bioinform..

[17]  Tommaso Simoncini,et al.  Extra-Nuclear Signaling of Progesterone Receptor to Breast Cancer Cell Movement and Invasion through the Actin Cytoskeleton , 2008, PloS one.

[18]  J. Visvader,et al.  Transcriptional regulators in mammary gland development and cancer. , 2003, The international journal of biochemistry & cell biology.

[19]  H. Lehrach,et al.  A Human Protein-Protein Interaction Network: A Resource for Annotating the Proteome , 2005, Cell.

[20]  S. L. Wong,et al.  Towards a proteome-scale map of the human protein–protein interaction network , 2005, Nature.

[21]  R. Sutherland,et al.  Molecular markers of prostate cancer outcome. , 2005, European journal of cancer.

[22]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[23]  BMC Bioinformatics , 2005 .

[24]  Hong Zhao,et al.  PGDB: a curated and integrated database of genes related to the prostate , 2003, Nucleic Acids Res..

[25]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[26]  Steven C. Lawlor,et al.  MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data , 2003, Genome Biology.

[27]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[29]  Richard M. Karp,et al.  Detecting Disease-Specific Dysregulated Pathways Via Analysis of Clinical Expression Profiles , 2008, RECOMB.

[30]  D. Carbone,et al.  Decreased antigen presentation by dendritic cells in patients with breast cancer. , 1997, Clinical cancer research : an official journal of the American Association for Cancer Research.

[31]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[32]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[33]  Pankaj Agarwal,et al.  Inferring pathways from gene lists using a literature-derived network of biological relationships , 2005, Bioinform..

[34]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[35]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.

[36]  Gavin Sherlock,et al.  The Stanford Microarray Database: implementation of new analysis tools and open source release of software , 2002, Nucleic Acids Res..

[37]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[38]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[39]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[40]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[41]  X-S Zhang,et al.  Identifying differentially expressed pathways via a mixed integer linear programming model. , 2009, IET systems biology.

[42]  Hanno Steen,et al.  Development of human protein reference database as an initial platform for approaching systems biology in humans. , 2003, Genome research.

[43]  Serban Nacu,et al.  Gene expression network analysis and applications to immunology , 2007, Bioinform..

[44]  G. A. Meijer,et al.  New experimental markers for early detection of high-risk prostate cancer: role of cell–cell adhesion and cell migration , 2007, Journal of Cancer Research and Clinical Oncology.

[45]  R. Tibshirani,et al.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[47]  Jing Zhu,et al.  Edge-based scoring and searching method for identifying condition-responsive protein-protein interaction sub-network , 2007, Bioinform..

[48]  C. Streuli,et al.  Apoptosis regulation in the mammary gland , 2004, Cellular and Molecular Life Sciences.