Extracting consistent knowledge from highly inconsistent cancer gene data sources

BackgroundHundreds of genes that are causally implicated in oncogenesis have been found and collected in various databases. For efficient application of these abundant but diverse data sources, it is of fundamental importance to evaluate their consistency.ResultsFirst, we showed that the lists of cancer genes from some major data sources were highly inconsistent in terms of overlapping genes. In particular, most cancer genes accumulated in previous small-scale studies could not be rediscovered in current high-throughput genome screening studies. Then, based on a metric proposed in this study, we showed that most cancer gene lists from different data sources were highly functionally consistent. Finally, we extracted functionally consistent cancer genes from various data sources and collected them in our database F-Census.ConclusionsAlthough they have very low gene overlapping, most cancer gene data sources are highly consistent at the functional level, which indicates that they can separately capture partial genes in a few key pathways associated with cancer. Our results suggest that the sample sizes currently used for cancer studies might be inadequate for consistently capturing individual cancer genes, but could be sufficient for finding a number of cancer genes that could represent functionally most cancer genes. The F-Census database provides biologists with a useful tool for browsing and extracting functionally consistent cancer genes from various data sources.

[1]  Limsoon Wong,et al.  Using indirect protein interactions for the prediction of Gene Ontology functions , 2007, BMC Bioinformatics.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  D. Busam,et al.  An Integrated Genomic Analysis of Human Glioblastoma Multiforme , 2008, Science.

[4]  E. Birney,et al.  Patterns of somatic mutation in human cancer genomes , 2007, Nature.

[5]  Jing Zhu,et al.  Identifying Candidate Cancer Genes Based on Their Somatic Mutations Co-Occurring with Cancer Genes in Cancer Genome Profiling , 2009, 2009 2nd International Conference on Biomedical Engineering and Informatics.

[6]  Chris Sander,et al.  CancerGenes: a gene selection resource for cancer genome projects , 2006, Nucleic Acids Res..

[7]  G. Parmigiani,et al.  Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses , 2008, Science.

[8]  Takeshi Suzuki,et al.  RTCGD: retroviral tagged cancer gene database , 2004, Nucleic Acids Res..

[9]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[10]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[11]  George A Calin,et al.  MicroRNAs and cancer--new paradigms in molecular oncology. , 2009, Current opinion in cell biology.

[12]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[13]  Chang-Zheng Chen,et al.  MicroRNAs as oncogenes and tumor suppressors. , 2005, The New England journal of medicine.

[14]  Brian H. Dunford-Shore,et al.  Somatic mutations affect key pathways in lung adenocarcinoma , 2008, Nature.

[15]  Yunyan Gu,et al.  Finding disease-specific coordinated functions by multi-function genes: insight into the coordination mechanisms in diseases. , 2009, Genomics.

[16]  Desmond G. Higgins,et al.  Distinct Patterns in the Regulation and Evolution of Human Cancer Genes , 2008, Silico Biol..

[17]  J. Uhm Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2009 .

[18]  BMC Bioinformatics , 2005 .

[19]  Paul A. Bates,et al.  Global topological features of cancer proteins in the human interactome , 2006, Bioinform..

[20]  A. Sparks,et al.  The Genomic Landscapes of Human Breast and Colorectal Cancers , 2007, Science.

[21]  David L. Steffen,et al.  Digital reviews in molecular biology: approaches to structured digital publication , 2000, Bioinform..

[22]  Jesse J Salk,et al.  Cancer genome sequencing--an interim analysis. , 2009, Cancer research.

[23]  K. Loeb,et al.  Multiple mutations and cancer , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Jing Zhu,et al.  Edge-based scoring and searching method for identifying condition-responsive protein-protein interaction sub-network , 2007, Bioinform..

[25]  Wei Chen,et al.  Comparing the DNA Hypermethylome with Gene Mutations in Human Colorectal Cancer , 2007, PLoS genetics.

[26]  Andrea Ciliberto,et al.  Low duplicability and network fragility of cancer genes. , 2008, Trends in genetics : TIG.

[27]  Pingzhao Hu,et al.  Computational prediction of cancer-gene function , 2007, Nature Reviews Cancer.

[28]  David L. Steffen,et al.  OrCGDB: a database of genes involved in oral cancer , 2001, Nucleic Acids Res..

[29]  Yanming Yang,et al.  TSGDB: a database system for tumor suppressor genes , 2003, Bioinform..

[30]  F. Mitelman,et al.  Recurrent chromosome aberrations in cancer. , 2000, Mutation research.

[31]  T. Hubbard,et al.  Large-Scale Mutagenesis in p19ARF- and p53-Deficient Mice Identifies Cancer Genes and Their Collaborative Networks , 2008, Cell.

[32]  C. Croce,et al.  SnapShot: MicroRNAs in Cancer , 2009, Cell.

[33]  G. Parmigiani,et al.  The Consensus Coding Sequences of Human Breast and Colorectal Cancers , 2006, Science.

[34]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[35]  William C Hahn,et al.  Rules for making human tumor cells. , 2002, The New England journal of medicine.

[36]  David L. Steffen,et al.  The Breast Cancer Gene Database: a collaborative information resource , 1999, Oncogene.

[37]  Limsoon Wong,et al.  An efficient strategy for extensive integration of diverse biological data for protein function prediction , 2007, Bioinform..

[38]  Philippe Dessen,et al.  Atlas of Genetics and Cytogenetics in Oncology and Haematology, year 2003 , 2003, Nucleic Acids Res..

[39]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[40]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.

[41]  John Quackenbush,et al.  Multiple-laboratory comparison of microarray platforms , 2005, Nature Methods.

[42]  E. Birney,et al.  Patterns of somatic mutation in human cancer genomes , 2007, Nature.

[43]  C. Yeang,et al.  Combinatorial patterns of somatic gene mutations in cancer , 2008, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[44]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[45]  Leslie Cope,et al.  Convergence of Mutation and Epigenetic Alterations Identifies Common Genes in Cancer That Predict for Poor Prognosis , 2008, PLoS medicine.

[46]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[47]  T. Sjöblom Systematic analyses of the cancer genome: lessons learned from sequencing most of the annotated human protein-coding genes , 2008, Current opinion in oncology.

[48]  Christos A Ouzounis,et al.  Structural and functional properties of genes involved in human cancer , 2006, BMC Genomics.

[49]  Hui Xiao,et al.  Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes , 2009, Bioinform..

[50]  Leo Goodstadt,et al.  Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes , 2004, Genome Biology.

[51]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[52]  W. Hahn,et al.  Modelling the molecular circuitry of cancer , 2002, Nature Reviews Cancer.

[53]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[54]  J. A. Lozano,et al.  Prioritization of candidate cancer genes—an aid to oncogenomic studies , 2008, Nucleic acids research.

[55]  Jing Zhu,et al.  Apparently low reproducibility of true differential expression discoveries in microarray studies , 2008, Bioinform..