Building a model for disease classification integration in oncology, an approach based on the national cancer institute thesaurus

BackgroundIdentifying incident cancer cases within a population remains essential for scientific research in oncology. Data produced within electronic health records can be useful for this purpose. Due to the multiplicity of providers, heterogeneous terminologies such as ICD-10 and ICD-O-3 are used for oncology diagnosis recording purpose. To enable disease identification based on these diagnoses, there is a need for integrating disease classifications in oncology. Our aim was to build a model integrating concepts involved in two disease classifications, namely ICD-10 (diagnosis) and ICD-O-3 (topography and morphology), despite their structural heterogeneity. Based on the NCIt, a “derivative” model for linking diagnosis and topography-morphology combinations was defined and built. ICD-O-3 and ICD-10 codes were then used to instantiate classes of the “derivative” model. Links between terminologies obtained through the model were then compared to mappings provided by the Surveillance, Epidemiology, and End Results (SEER) program.ResultsThe model integrated 42% of neoplasm ICD-10 codes (excluding metastasis), 98% of ICD-O-3 morphology codes (excluding metastasis) and 68% of ICD-O-3 topography codes. For every codes instantiating at least a class in the “derivative” model, comparison with SEER mappings reveals that all mappings were actually available in the model as a link between the corresponding codes.ConclusionsWe have proposed a method to automatically build a model for integrating ICD-10 and ICD-O-3 based on the NCIt. The resulting “derivative” model is a machine understandable resource that enables an integrated view of these heterogeneous terminologies. The NCIt structure and the available relationships can help to bridge disease classifications taking into account their structural and granular heterogeneities. However, (i) inconsistencies exist within the NCIt leading to misclassifications in the “derivative” model, (ii) the “derivative” model only integrates a part of ICD-10 and ICD-O-3. The NCIt is not sufficient for integration purpose and further work based on other termino-ontological resources is needed in order to enrich the model and avoid identified inconsistencies.

[1]  Fabio Vitali,et al.  Modelling OWL Ontologies with Graffoo , 2014, ESWC.

[2]  Sandro Tognazzo,et al.  Probabilistic classifiers and automated cancer registration: An exploratory application , 2009, J. Biomed. Informatics.

[3]  Cornelius Rosse,et al.  The Foundational Model of Anatomy Ontology , 2008, Anatomy Ontologies for Bioinformatics.

[4]  W Ceusters,et al.  A Terminological and Ontological Analysis of the NCI Thesaurus , 2005, Methods of Information in Medicine.

[5]  Paolo Crosignani,et al.  Comparison with manual registration reveals satisfactory completeness and efficiency of a computerized cancer registration system , 2008, J. Biomed. Informatics.

[6]  P. Trott,et al.  International Classification of Diseases for Oncology , 1977 .

[7]  Raphaël Troncy,et al.  The Semantic Web: ESWC 2014 Satellite Events , 2014, Lecture Notes in Computer Science.

[8]  P Ingrand,et al.  Automated Selection of Relevant Information for Notification of Incident Cancer Cases within a Multisource Cancer Registry , 2013, Methods of Information in Medicine.

[9]  György Surján,et al.  Ontological analysis of SNOMED CT , 2008, BMC Medical Informatics Decis. Mak..

[10]  Werner Ceusters,et al.  Toward an Ontological Treatment of Disease and Diagnosis , 2009, Summit on translational bioinformatics.

[11]  Cui Tao,et al.  Terminology representation guidelines for biomedical ontologies in the semantic web notations , 2013, J. Biomed. Informatics.

[12]  P. Zambon,et al.  Quality control of automatically defined cancer cases by the automated registration system of the Venetian Tumour Registry. Quality control of cancer cases automatically registered. , 2005, European journal of public health.

[13]  H. Prokosch,et al.  Perspectives for Medical Informatics , 2009, Methods of Information in Medicine.

[14]  G. Tagliabue,et al.  Consistency and accuracy of diagnostic cancer codes generated by automated registration: comparison with manual registration , 2006, Population health metrics.

[15]  Martin Romacker,et al.  Part-whole reasoning in medical ontologies revisited-introducing SEP triplets into classification-based description logics , 1998, AMIA.

[16]  Katherine E Henson,et al.  Risk of Suicide After Cancer Diagnosis in England , 2018, JAMA psychiatry.

[17]  Stefan Schulz,et al.  The Pitfalls of Thesaurus Ontologization - the Case of the NCI Thesaurus. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[18]  T. Murdoch,et al.  The inevitable application of big data to health care. , 2013, JAMA.

[19]  Nicholas Rescher,et al.  Axioms for the part relation , 1955 .

[20]  L. Simonato,et al.  Automated Data Collection in Cancer Registration , 1998 .

[21]  Olivier Bodenreider,et al.  Investigating subsumption in SNOMED CT: An exploration into large description logic-based biomedical terminologies , 2007, Artif. Intell. Medicine.

[22]  Barry Smith,et al.  Oncology Ontology in the NCI Thesaurus , 2005, AIME.

[23]  G. Hartvigsen,et al.  Secondary Use of EHR: Data Quality Issues and Informatics Opportunities , 2010, Summit on translational bioinformatics.

[24]  Charles Safran,et al.  Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. , 2007, Journal of the American Medical Informatics Association : JAMIA.

[25]  Christel Daniel-Le Bozec,et al.  Integrating clinical research with the Healthcare Enterprise: From the RE-USE project to the EHR4CR platform , 2011, J. Biomed. Informatics.

[26]  Stefan Schulz,et al.  Part-whole representation and reasoning in formal biomedical ontologies , 2005, Artif. Intell. Medicine.

[27]  Crystal Kallem,et al.  Demonstrating "collect once, use many"--assimilating public health secondary data use requirements into an existing Domain Analysis Model. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[28]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..