MGOGP: a gene module-based heuristic algorithm for cancer-related gene prioritization

BackgroundPrioritizing genes according to their associations with a cancer allows researchers to explore genes in more informed ways. By far, Gene-centric or network-centric gene prioritization methods are predominated. Genes and their protein products carry out cellular processes in the context of functional modules. Dysfunctional gene modules have been previously reported to have associations with cancer. However, gene module information has seldom been considered in cancer-related gene prioritization.ResultsIn this study, we propose a novel method, MGOGP (Module and Gene Ontology-based Gene Prioritization), for cancer-related gene prioritization. Different from other methods, MGOGP ranks genes considering information of both individual genes and their affiliated modules, and utilize Gene Ontology (GO) based fuzzy measure value as well as known cancer-related genes as heuristics. The performance of the proposed method is comprehensively validated by using both breast cancer and prostate cancer datasets, and by comparison with other methods. Results show that MGOGP outperforms other methods, and successfully prioritizes more genes with literature confirmed evidence.ConclusionsThis work will aid researchers in the understanding of the genetic architecture of complex diseases, and improve the accuracy of diagnosis and the effectiveness of therapy.

[1]  A. Fukushima DiffCorr: an R package to analyze and visualize differential correlations in biological networks. , 2013, Gene.

[2]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[3]  Doron Lancet,et al.  MalaCards: A Comprehensive Automatically‐Mined Database of Human Diseases , 2014, Current protocols in bioinformatics.

[4]  Shailendra Singh,et al.  Computational Disease Gene Prioritization: An Appraisal , 2014, J. Comput. Biol..

[5]  Chao Wu,et al.  Integrating gene expression and protein-protein interaction network to prioritize cancer-associated genes , 2012, BMC Bioinformatics.

[6]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[7]  Korbinian Strimmer,et al.  fdrtool: a versatile R package for estimating local and tail area-based false discovery rates , 2008, Bioinform..

[8]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[9]  Jinbo Xu,et al.  Disease Gene Prioritization Using Network and Feature , 2015, J. Comput. Biol..

[10]  Timothy Ravasi,et al.  Defining the protein interaction network of human malaria parasite Plasmodium falciparum. , 2012, Genomics.

[11]  Mehmet Koyutürk,et al.  DADA: Degree-Aware Algorithms for Network-Based Disease Gene Prioritization , 2011, BioData Mining.

[12]  Jing Chen,et al.  Improved human disease candidate gene prioritization using mouse phenotype , 2007, BMC Bioinformatics.

[13]  Yana Bromberg,et al.  Chapter 15: Disease Gene Prioritization , 2013, PLoS Comput. Biol..

[14]  Bart De Moor,et al.  Endeavour update: a web resource for gene prioritization in multiple species , 2008, Nucleic Acids Res..

[15]  Yijia Zhang,et al.  Integrating experimental and literature protein-protein interaction data for protein complex prediction , 2015, BMC Genomics.

[16]  M. Natália D. S. Cordeiro,et al.  Efficient and biologically relevant consensus strategy for Parkinson’s disease gene prioritization , 2016, BMC Medical Genomics.

[17]  Muin J. Khoury,et al.  Gene Prospector: An evidence gateway for evaluating potential susceptibility genes and interacting risk factors for human diseases , 2008, BMC Bioinformatics.

[18]  Carl Kingsford,et al.  The power of protein interaction networks for associating genes with diseases , 2010, Bioinform..

[19]  Thomas Lengauer,et al.  Improving disease gene prioritization using the semantic similarity of Gene Ontology terms , 2010, Bioinform..

[20]  T. Mukohara,et al.  PI3K mutations in breast cancer: prognostic and therapeutic implications , 2015, Breast cancer.

[21]  David Bryant,et al.  DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists , 2007, Nucleic Acids Res..

[22]  Yan Zhang,et al.  Research and applications: An integrated approach to identify causal network modules of complex diseases with application to colorectal cancer , 2013, J. Am. Medical Informatics Assoc..

[23]  Y. Moreau,et al.  Computational tools for prioritizing candidate genes: boosting disease gene discovery , 2012, Nature Reviews Genetics.

[24]  Zhang-Zhi Hu,et al.  The iProClass integrated database for protein functional analysis , 2004, Comput. Biol. Chem..

[25]  Daniel Baumhoer,et al.  Role of the VEGF ligand to receptor ratio in the progression of mismatch repair-proficient colorectal cancer , 2010, BMC Cancer.

[26]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[27]  Mario Albrecht,et al.  Recent approaches to the prioritization of candidate disease genes , 2012, Wiley interdisciplinary reviews. Systems biology and medicine.

[28]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[29]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[30]  João Pedro de Magalhães,et al.  GeneFriends: a human RNA-seq-based gene and transcript co-expression database , 2014, Nucleic Acids Res..

[31]  Ioannis Xenarios,et al.  DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions , 2002, Nucleic Acids Res..

[32]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[33]  Xing-Ming Zhao,et al.  Identifying disease genes and module biomarkers by differential interactions , 2012, J. Am. Medical Informatics Assoc..

[34]  Jing Chen,et al.  Disease candidate gene identification and prioritization using protein interaction networks , 2009, BMC Bioinformatics.

[35]  L. Wong,et al.  Methods for protein complex prediction and their contributions towards understanding the organisation, function and dynamics of complexes , 2015, FEBS letters.

[36]  R. F. Hashimoto,et al.  NERI: network-medicine based integrative approach for disease gene prioritization by relative importance , 2015, BMC Bioinformatics.

[37]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[38]  Yuan Tian,et al.  GECluster: a novel protein complex prediction method , 2014, Biotechnology, biotechnological equipment.

[39]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[40]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[41]  J. Wardle,et al.  No evidence for association between BMI and 10 candidate genes at ages 4, 7 and 10 in a large UK sample of twins , 2008, BMC Medical Genetics.

[42]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[43]  Armando Blanco,et al.  ProphNet: A generic prioritization method through propagation of information , 2014, BMC Bioinformatics.

[44]  Yves Moreau,et al.  PINTA: a web server for network-based gene prioritization from expression data , 2011, Nucleic Acids Res..

[45]  Xing Chen,et al.  A novel candidate disease genes prioritization method based on module partition and rank fusion. , 2010, Omics : a journal of integrative biology.

[46]  Doron Lancet,et al.  PathCards: multi-source consolidation of human biological pathways , 2015, Database J. Biol. Databases Curation.

[47]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[48]  D. Lancet,et al.  GeneCards: integrating information about genes, proteins and diseases. , 1997, Trends in genetics : TIG.

[49]  Ting Chen,et al.  Prioritizing functional modules mediating genetic perturbations and their phenotypic effects: a global strategy , 2008, Genome Biology.

[50]  James M. Keller,et al.  Fuzzy Measures on the Gene Ontology for Gene Product Similarity , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[51]  Peng Qiu,et al.  TCGA-Assembler: open-source software for retrieving and processing TCGA data , 2014, Nature Methods.