Methodology and mechanisms for federation of heterogeneous metadata sources and ontology development in emerging collaborative environment

Purpose Leading-edge information and communication technology provides the base to facilitate obtaining, interoperating and federating shared metadata knowledge in collaborative networks from multiple heterogeneous data sources. The purpose of this study is to develop a methodology and a set of mechanisms to support this task in the collaborative environment. Design/methodology/approach In this paper, the authors first identify and capture four main typical sources to find or generate metadata knowledge for shared data in emerging networked environments, including existing well-designed metadata, the typical ones are relational schemas of existing databases in the environment; fragmented metadata sources, i.e. metadata that can be realized from existing mission statements and example application scenarios in the environment, usually characterized by their fragmented, lightweight and behavior-intensive features; extracting metadata for simple labeled unstructured data, e.g. textual communications among its stakeholders; and semantic constraints on metadata, e.g. the temporal data behavior could be generated from governance policies in the environment. Second, the authors introduce their systematic methodology to the unification of the resulted metadata consisting of four semiautomated unification steps that gradually develops and enhances a unified ontology for the environment, formalized in web ontology language. Findings The methodology steps and their corresponding mechanisms are described and exemplified in detail in this paper. Furthermore, this paper presents the outcome of applying the authors’ methodology to an example emerging case through the generation of a unified ontology for that environment. Originality/value The addressed example application area is a real case in the field of higher education in China and therefore serves as a proof of concept and verification of the effectiveness of the authors’ proposed approach.

[1]  Runtong Zhang,et al.  Fulfilling information needs of patients in online health communities. , 2020, Health information and libraries journal.

[2]  Domenico Ursino,et al.  An approach to extracting complex knowledge patterns among concepts belonging to structured, semi-structured and unstructured sources in a data lake , 2019, Inf. Sci..

[3]  Vincenzo Loia,et al.  A semantic-grained perspective of latent knowledge modeling , 2017, Inf. Fusion.

[4]  Wenxin Mu,et al.  Extracting Topics and Their Relationship from College Student Mentoring , 2018 .

[5]  Longzhuang Li,et al.  A Framework for Ontology-Based Top-K Global Schema Generation , 2017, Journal on Data Semantics.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  José Francisco Aldana Montes,et al.  An ontology-based data integration approach for web analytics in e-commerce , 2016, Expert Syst. Appl..

[8]  Luis M. Camarinha-Matos,et al.  Collaborative Networks: A Pillar of Digital Transformation , 2019, Applied Sciences.

[9]  Guy Doumeingts,et al.  Architectures for enterprise integration and interoperability: Past, present and future , 2008, Comput. Ind..

[10]  Evangelos Kalampokis,et al.  On modeling linked open statistical data , 2019, J. Web Semant..

[11]  Forrest Shull,et al.  The Computational Research and Engineering Acquisition Tools and Environments (CREATE) Program, Part 2 , 2016, Comput. Sci. Eng..

[12]  Bodo Urban,et al.  An Ontology-Based Approach to Enable Knowledge Representation and Reasoning in Worker-Cobot Agile Manufacturing , 2017, Future Internet.

[13]  Alessandra Mileo,et al.  Real-time data analytics and event detection for IoT-enabled communication systems , 2017, J. Web Semant..

[14]  Hamideh Afsarmanesh,et al.  The management of ontologies in the VO Breeding Environments domain , 2010 .

[15]  Mohammad Nazir Ahmad,et al.  An Ontology for Sharing and Managing Information in Disaster Response: In Flood Response Usage Scenarios , 2019, Journal on Data Semantics.

[16]  Runtong Zhang,et al.  A Knowledge-Constrained Access Control Model for Protecting Patient Privacy in Hospital Information Systems , 2018, IEEE Journal of Biomedical and Health Informatics.

[17]  Giovanni Quattrone,et al.  Semantics-Guided Clustering of Heterogeneous XML Schemas , 2007, J. Data Semant..

[18]  Stephan Aier,et al.  Design principles for digital value co-creation networks: a service-dominant logic perspective , 2019, Electronic Markets.

[19]  Jia Liu,et al.  Urban big data fusion based on deep learning: An overview , 2020, Inf. Fusion.

[20]  Santi Phithakkitnukoon,et al.  Understanding Human Mobility Patterns in a Developing Country Using Mobile Phone Data , 2019, Data Sci. J..

[21]  Yue Lu,et al.  Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA , 2011, Information Retrieval.

[22]  Robert L. Grossman,et al.  A Case for Data Commons: Toward Data Science as a Service , 2016, Computing in Science & Engineering.