Using an artificial neural network to map cancer common data elements to the biomedical research integrated domain group model in a semi-automated manner

The medical community uses a variety of data standards for both clinical and research reporting needs. ISO 11179 Common Data Elements (CDEs) represent one such standard that provides robust data point definitions. Another standard is the Biomedical Research Integrated Domain Group (BRIDG) model, which is a domain analysis model that provides a contextual framework for biomedical and clinical research data. Mapping the CDEs to the BRIDG model is important; in particular, it can facilitate mapping the CDEs to other standards. Unfortunately, manual mapping, which is the current method for creating the CDE mappings, is error-prone and time-consuming; this creates a significant barrier for researchers who utilize CDEs. In this work, we developed a semi-automated algorithm to map CDEs to likely BRIDG classes. First, we extended and improved our previously developed artificial neural network (ANN) alignment algorithm. We then used a collection of 1284 CDEs with robust mappings to BRIDG classes as the gold standard to train and obtain the appropriate weights of six attributes in CDEs. Afterward, we calculated the similarity between a CDE and each BRIDG class. Finally, the algorithm produces a list of candidate BRIDG classes to which the CDE of interest may belong. For CDEs semantically similar to those used in training, a match rate of over 90% was achieved. For those partially similar, a match rate of 80% was obtained and for those with drastically different semantics, a match rate of up to 70% was achieved. Our semi-automated mapping process reduces the burden of domain experts. The weights are all significant in six attributes. Experimental results indicate that the availability of training data is more important than the semantic similarity of the testing data to the training data. We address the overfitting problem by selecting CDEs randomly and adjusting the ratio of training and verification samples. Experimental results on real-world use cases have proven the effectiveness and efficiency of our proposed methodology in mapping CDEs with BRIDG classes, both those CDEs seen before as well as new, unseen CDEs. In addition, it reduces the mapping burden and improves the mapping quality.

[1]  Georgina Stegmayer,et al.  Knowledge discovery through ontology matching: An approach based on an Artificial Neural Network model , 2012, Inf. Sci..

[2]  Kin Wah Fung,et al.  Heterogeneous but "standard" coding systems for adverse events: Issues in achieving interoperability between apples and oranges. , 2008, Contemporary clinical trials.

[3]  Mark A. Musen,et al.  PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment , 2000, AAAI/IAAI.

[4]  Kevin Chen-Chuan Chang,et al.  Statistical schema matching across web query interfaces , 2003, SIGMOD '03.

[5]  Jingshan Huang,et al.  Mapping Common Data Elements to a Domain Model Using an Artificial Neural Network , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Mark A. Musen,et al.  SMART: Automated Support for Ontology Merging and Alignment , 1999 .

[8]  Gilberto Fragoso,et al.  caCORE version 3: Implementation of a model driven, service-oriented architecture for semantic interoperability , 2008, J. Biomed. Informatics.

[9]  Giuseppe Di Battista,et al.  26 Computer Networks , 2004 .

[10]  Lauren B. Becnel,et al.  BRIDG: a domain information model for translational and clinical protocol-driven research , 2017, J. Am. Medical Informatics Assoc..

[11]  Andreas Stafylopatis,et al.  Learning Ontology Alignments Using Recursive Neural Networks , 2005, ICANN.

[12]  Sherri de Coronado,et al.  NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information , 2007, J. Biomed. Informatics.

[13]  Erhard Rahm,et al.  COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.

[14]  Pedro M. Domingos,et al.  Learning to match ontologies on the Semantic Web , 2003, The VLDB Journal.

[15]  Ming Mao,et al.  An adaptive ontology mapping approach with neural network based constraint satisfaction , 2010, J. Web Semant..

[16]  José M. Vidal,et al.  Ontology Matching Using an Artificial Neural Network to Learn Weights , 2007 .

[17]  W Jim Zheng,et al.  Use artificial neural network to align biological ontologies , 2008, BMC Genomics.

[18]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[19]  Martin Maiers,et al.  Integration of Hematopoietic Cell Transplantation Outcomes Data - Data Standards Are Not Enough , 2015, DILS.

[20]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[21]  Mark A. Musen,et al.  Anchor-PROMPT: Using Non-Local Context for Semantic Matching , 2001, OIS@IJCAI.