Metadata mapping and reuse in caBIG™

BackgroundThis paper proposes that interoperability across biomedical databases can be improved by utilizing a repository of Common Data Elements (CDEs), UML model class-attributes and simple lexical algorithms to facilitate the building domain models. This is examined in the context of an existing system, the National Cancer Institute (NCI)'s cancer Biomedical Informatics Grid (caBIG™). The goal is to demonstrate the deployment of open source tools that can be used to effectively map models and enable the reuse of existing information objects and CDEs in the development of new models for translational research applications. This effort is intended to help developers reuse appropriate CDEs to enable interoperability of their systems when developing within the caBIG™ framework or other frameworks that use metadata repositories.ResultsThe Dice (di-grams) and Dynamic algorithms are compared and both algorithms have similar performance matching UML model class-attributes to CDE class object-property pairs. With algorithms used, the baselines for automatically finding the matches are reasonable for the data models examined. It suggests that automatic mapping of UML models and CDEs is feasible within the caBIG™ framework and potentially any framework that uses a metadata repository.ConclusionThis work opens up the possibility of using mapping algorithms to reduce cost and time required to map local data models to a reference data model such as those used within caBIG™. This effort contributes to facilitating the development of interoperable systems within caBIG™ as well as other metadata frameworks. Such efforts are critical to address the need to develop systems to handle enormous amounts of diverse data that can be leveraged from new biomedical methodologies.

[1]  Lee Min Lau,et al.  Applying Hybrid Algorithms for Text Matching to Automated Biomedical Vocabulary Mapping , 2005, AMIA.

[2]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[3]  V Maojo,et al.  Section 7: Bioinformatics: Bioinformatics Linkage of Heterogeneous Clinical and Genomic Information in Support of Personalized Medicine , 2007, Yearbook of Medical Informatics.

[4]  James A. Hendler,et al.  The National Cancer Institute's Thésaurus and Ontology , 2003, J. Web Semant..

[5]  A. A. Knecht EVALUATION OF A , 1972 .

[6]  Natalya Fridman Noy Tools for Mapping and Merging Ontologies , 2004, Handbook on Ontologies.

[7]  Senthil K. Nachimuthu,et al.  Generalizability of Hybrid Search Algorithms to Map Multiple Biomedical Vocabulary Domains , 2006, AMIA.

[8]  Pedro M. Domingos,et al.  Ontology Matching: A Machine Learning Approach , 2004, Handbook on Ontologies.

[9]  Kent A. Spackman,et al.  SNOMED RT: a reference terminology for health care , 1997, AMIA.

[10]  Mark A. Musen,et al.  PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment , 2000, AAAI/IAAI.

[11]  Joel H. Saltz,et al.  caGrid: design and implementation of the core architecture of the cancer biomedical informatics grid , 2006, Bioinform..

[12]  Joel H. Saltz,et al.  Model Formulation: caGrid 1.0: An Enterprise Grid Infrastructure for Biomedical Research , 2008, J. Am. Medical Informatics Assoc..

[13]  K. Buetow Cyberinfrastructure: Empowering a "Third Way" in Biomedical Research , 2005, Science.

[14]  Venkata Subramaniam,et al.  Information Retrieval: Data Structures & Algorithms , 1992 .

[15]  Yao Sun,et al.  Methods for automated concept mapping between medical databases , 2004, J. Biomed. Informatics.

[16]  Steffen Staab,et al.  International Handbooks on Information Systems , 2013 .

[17]  Stanley M. Huff,et al.  Research Paper: Evaluation of a "Lexically Assign, Logically Refine" Strategy for Semi-automated Integration of Overlapping Terminologies , 1998, J. Am. Medical Informatics Assoc..

[18]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[19]  Olivier Bodenreider,et al.  Combining Lexical and Semantic Methods of Inter-terminology Mapping Using the UMLS , 2007, MedInfo.

[20]  R A Rocha,et al.  Using digrams to map controlled medical vocabularies. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.