An Ontology to Model the International Rules for Multiple Primary Malignant Tumours in Cancer Registration

Population-based cancer registry data provide a key epidemiological resource for monitoring cancer in defined populations. Validation of the data variables contributing to a common data set is necessary to remove statistical bias; the process is currently performed centrally. An ontology-based approach promises advantages in devolving the validation process to the registry level but the checks regarding multiple primary tumours have presented a hurdle. This work presents a solution by modelling the international rules for multiple primary cancers in description logic. Topography groupings described in the rules had to be further categorised in order to simplify the axioms. Description logic expressivity was constrained as far as possible for reasons of automatic reasoning performance. The axioms were consistently able to trap all the different types of scenarios signalling violation of the rules. Batch processing of many records were performed using the Web Ontology Language application programme interface. Performance issues were circumvented for large data sets using the software interface to perform the reasoning operations on the basis of the axioms encoded in the ontology. These results remove one remaining hurdle in developing a purely ontology-based solution for performing the European harmonised data-quality checks, with a number of inherent advantages including the formalisation and integration of the validation rules within the domain data model itself.

[1]  Sean Bechhofer,et al.  The OWL API: A Java API for OWL ontologies , 2011, Semantic Web.

[2]  Sungyoung Lee,et al.  Data-driven knowledge acquisition, validation, and transformation into HL7 Arden Syntax , 2018, Artif. Intell. Medicine.

[3]  Sungyoung Lee,et al.  Acquiring guideline-enabled data driven clinical knowledge model using formally verified refined knowledge acquisition method , 2020, Comput. Methods Programs Biomed..

[4]  Uli Sattler,et al.  Being complex on the left-hand-side: General Concept Inclusions , 2012 .

[5]  Jesualdo Tomás Fernández-Breis,et al.  Analysis and visualization of disease courses in a semantically-enabled cancer registry , 2017, Journal of Biomedical Semantics.

[6]  Catalina Martínez-Costa,et al.  Validating EHR clinical models using ontology patterns , 2017, J. Biomed. Informatics.

[7]  Martin Boeker,et al.  TNM-O: ontology support for staging of malignant tumours , 2016, Journal of Biomedical Semantics.

[8]  Giusti Francesco,et al.  User manual for the JRC - ENCR Cancer Registries Data Quality Check Software (QCS) , 2016 .

[9]  Stefan Brüggemann Rule Mining for Automatic Ontology Based Data Cleaning , 2008, APWeb.

[10]  P. Boyle,et al.  The cancer registry in cancer control: an overview. , 1985, IARC scientific publications.

[11]  M. J. Carmen,et al.  A proposal on cancer data quality checks: one common procedure for European cancer registries , 2014 .

[12]  Mark A. Musen,et al.  The protégé project: a look back and a look forward , 2015, SIGAI.

[13]  Jacques Ferlay,et al.  International rules for multiple primary cancers. , 2005, Asian Pacific journal of cancer prevention : APJCP.

[14]  Luciana Neamtiu,et al.  An ontology-based approach for developing a harmonised data-validation tool for European cancer registration , 2021, J. Biomed. Semant..

[15]  Kit C. B. Roes,et al.  Validation of multisource electronic health record data: an application to blood transfusion data , 2017, BMC Medical Informatics and Decision Making.