Generating a focused view of disease ontology cancer terms for pan-cancer data integration and analysis

Bio-ontologies provide terminologies for the scientific community to describe biomedical entities in a standardized manner. There are multiple initiatives that are developing biomedical terminologies for the purpose of providing better annotation, data integration and mining capabilities. Terminology resources devised for multiple purposes inherently diverge in content and structure. A major issue of biomedical data integration is the development of overlapping terms, ambiguous classifications and inconsistencies represented across databases and publications. The disease ontology (DO) was developed over the past decade to address data integration, standardization and annotation issues for human disease data. We have established a DO cancer project to be a focused view of cancer terms within the DO. The DO cancer project mapped 386 cancer terms from the Catalogue of Somatic Mutations in Cancer (COSMIC), The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium, Therapeutically Applicable Research to Generate Effective Treatments, Integrative Oncogenomics and the Early Detection Research Network into a cohesive set of 187 DO terms represented by 63 top-level DO cancer terms. For example, the COSMIC term ‘kidney, NS, carcinoma, clear_cell_renal_cell_carcinoma’ and TCGA term ‘Kidney renal clear cell carcinoma’ were both grouped to the term ‘Disease Ontology Identification (DOID):4467 / renal clear cell carcinoma’ which was mapped to the TopNodes_DOcancerslim term ‘DOID:263 / kidney cancer’. Mapping of diverse cancer terms to DO and the use of top level terms (DO slims) will enable pan-cancer analysis across datasets generated from any of the cancer term sources where pan-cancer means including or relating to all or multiple types of cancer. The terms can be browsed from the DO web site (http://www.disease-ontology.org) and downloaded from the DO’s Apache Subversion or GitHub repositories. Database URL: http://www.disease-ontology.org

[1]  C. Compton,et al.  The American Joint Committee on Cancer: the 7th Edition of the AJCC Cancer Staging Manual and the Future of TNM , 2010, Annals of Surgical Oncology.

[2]  L. Sobin,et al.  TNM classification of malignant tumors, fifth edition (1997) , 1997, Cancer.

[3]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[4]  E. Friedberg,et al.  DNA Repair and Mutagenesis , 2006 .

[5]  Benjamin J. Raphael,et al.  Mutational landscape and significance across 12 major cancer types , 2013, Nature.

[6]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[7]  Peilin Jia,et al.  Patterns and processes of somatic mutations in nine major cancers , 2014, BMC Medical Genomics.

[8]  Mark A. Ragan,et al.  Automatic, context-specific generation of Gene Ontology slims , 2010, BMC Bioinformatics.

[9]  S. Leff What is Cancer , 1970 .

[10]  L. Liotta,et al.  Cancer cell invasion and metastasis. , 1992, Scientific American.

[11]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[12]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[13]  Michael P. Schroeder,et al.  IntOGen-mutations identifies cancer drivers across tumor types , 2013, Nature Methods.

[14]  S. Elmore Apoptosis: A Review of Programmed Cell Death , 2007, Toxicologic pathology.

[15]  L Libenson,et al.  On the definition, cause and nomenclature of cancer. , 1978, Medical hypotheses.

[16]  Gary D Bader,et al.  International network of cancer genome projects , 2010, Nature.

[17]  Daniel J. Crichton,et al.  A framework for organizing cancer-related variations from existing databases, publications and NGS data using a High-performance Integrated Virtual Environment (HIVE) , 2014, Database J. Biol. Databases Curation.

[18]  F B ROGERS,et al.  Medical Subject Headings , 1948, Nature.

[19]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[20]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[21]  R. Verhaak,et al.  The Pan-Cancer Analysis of Pseudogene Expression Reveals Biologically and Clinically Relevant Tumour Subtypes , 2014, Nature Communications.

[22]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[23]  Gang Fu,et al.  Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data , 2014, Nucleic Acids Res..

[24]  D. Carter TNM Classification of Malignant Tumors , 1998 .

[25]  J. Bard,et al.  Ontologies in biology: design, applications and future challenges , 2004, Nature Reviews Genetics.

[26]  Early Detection Research Network in the US , 1999, Disease markers.

[27]  S A Forbes,et al.  The Catalogue of Somatic Mutations in Cancer (COSMIC) , 2008, Current protocols in human genetics.

[28]  Xiaoming Zeng,et al.  International Classification of Diseases, 10th Revision: It's Coming, Ready or Not , 2011, The health care manager.

[29]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[30]  A. Sudhakar,et al.  History of Cancer, Ancient and Modern Treatment Methods. , 2009, Journal of cancer science & therapy.

[31]  klaguia International Network of Cancer Genome Projects , 2010 .

[32]  Yu Fan,et al.  BioXpress: an integrated RNA-seq-derived gene expression database for pan-cancer analysis , 2015, Database J. Biol. Databases Curation.

[33]  L. Chin,et al.  Making sense of cancer genomic data. , 2011, Genes & development.

[34]  G. Evan,et al.  Proliferation, cell cycle and apoptosis in cancer , 2001, Nature.

[35]  B. Kennedy,et al.  Cancer staging. , 1979, JAMA.

[36]  S. Robboy,et al.  Progress in medical information management. Systematized nomenclature of medicine (SNOMED). , 1980, JAMA.

[37]  Prahlad T. Ram,et al.  A pan-cancer proteomic perspective on The Cancer Genome Atlas , 2014, Nature Communications.

[38]  James E. Klaunig,et al.  Oxidative Stress and Oxidative Damage in Carcinogenesis , 2010, Toxicologic pathology.

[39]  Frédérique Penault-Llorca,et al.  Promising pre-clinical validation of targeted radionuclide therapy using a [131I] labelled iodoquinoxaline derivative for an effective melanoma treatment , 2009 .

[40]  R. Lothe,et al.  Transcriptome instability as a molecular pan-cancer characteristic of carcinomas , 2014, BMC Genomics.

[41]  F. Marincola,et al.  Cancer classification using the Immunoscore: a worldwide task force , 2012, Journal of Translational Medicine.

[42]  L. Sobin,et al.  TNM Classification of Malignant Tumours , 1987, UICC International Union Against Cancer.

[43]  R. Mazumder,et al.  Human germline and pan-cancer variomes and their distinct functional profiles , 2014, Nucleic acids research.

[44]  M. Peter,et al.  Programmed cell death: Apoptosis meets necrosis , 2011, Nature.

[45]  G. Majno,et al.  Apoptosis, oncosis, and necrosis. An overview of cell death. , 1995, The American journal of pathology.

[46]  Daniel L. Rubin,et al.  Biomedical ontologies: a functional perspective , 2007, Briefings Bioinform..

[47]  Syed Haider,et al.  International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data , 2011, Database J. Biol. Databases Curation.

[48]  Peter K. L. Ng,et al.  What is Cancer? , 2012 .

[49]  Henry Z. Montes,et al.  TNM Classification of Malignant Tumors, 7th edition , 2010 .