OC-2-KB: A software pipeline to build an evidence-based obesity and cancer knowledge base

Obesity has been linked to several types of cancer. Access to adequate health information activates people's participation in managing their own health, which ultimately improves their health outcomes. Nevertheless, the existing online information about the relationship between obesity and cancer is heterogeneous and poorly organized. A formal knowledge representation can help better organize and deliver quality health information. Currently, there are several efforts in the biomedical domain to convert unstructured data to structured data and store them in Semantic Web knowledge bases (KB). In this demo paper, we present, OC-2-KB (Obesity and Cancer to Knowledge Base), a system that is tailored to guide the automatic KB construction for managing obesity and cancer knowledge from free-text scientific literature (i.e., PubMed abstracts) in a systematic way. OC-2-KB has two important modules which perform the acquisition of entities and the extraction then classification of relationships among these entities. We tested the OC-2-KB system on a data set with 23 manually annotated obesity and cancer PubMed abstracts and created a preliminary KB with 765 triples. We conducted a preliminary evaluation on this sample of triples and reported our evaluation results.

[1]  Josh Hanna,et al.  Towards an obesity-cancer knowledge base: Biomedical entity identification and relation detection , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  Gerhard Weikum,et al.  DeepLife: An Entity-aware Search, Analytics and Exploration Platform for Health and Life Sciences , 2016, ACL.

[3]  William W. Cohen,et al.  Bootstrapping Biomedical Ontologies for Scientific Text using NELL , 2012, BioNLP@HLT-NAACL.

[4]  Dong Hoon Lee,et al.  Adult weight gain and adiposity-related cancers: a dose-response meta-analysis of prospective observational studies. , 2015, Journal of the National Cancer Institute.

[5]  Halil Kilicoglu,et al.  SemMedDB: a PubMed-scale repository of biomedical semantic predications , 2012, Bioinform..

[6]  Gerhard Weikum,et al.  KnowLife: A knowledge graph for health and life sciences , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[7]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[8]  Maguelonne Teisseire,et al.  Biomedical term extraction: overview and a new methodology , 2015, Information Retrieval Journal.

[9]  Lori M. Minasian,et al.  Recommendations for Obesity Clinical Trials in Cancer Survivors: American Society of Clinical Oncology Statement. , 2015, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[10]  Maguelonne Teisseire,et al.  BIOTEX: A system for Biomedical Terminology Extraction, Ranking, and Validation , 2014, International Semantic Web Conference.

[11]  Peter Szolovits,et al.  Automatic lymphoma classification with sentence subgraph mining from pathology reports. , 2014, Journal of the American Medical Informatics Association : JAMIA.