EAPB: entropy-aware path-based metric for ontology quality

BackgroundEntropy has become increasingly popular in computer science and information theory because it can be used to measure the predictability and redundancy of knowledge bases, especially ontologies. However, current entropy applications that evaluate ontologies consider only single-point connectivity rather than path connectivity, and they assign equal weights to each entity and path.ResultsWe propose an Entropy-Aware Path-Based (EAPB) metric for ontology quality by considering the path information between different vertices and textual information included in the path to calculate the connectivity path of the whole network and dynamic weights between different nodes. The information obtained from structure-based embedding and text-based embedding is multiplied by the connectivity matrix of the entropy computation. EAPB is analytically evaluated against the state-of-the-art criteria. We have performed empirical analysis on real-world medical ontologies and a synthetic ontology based on the following three aspects: ontology statistical information (data quantity), entropy evaluation (data quality), and a case study (ontology structure and text visualization). These aspects mutually demonstrate the reliability of the proposed metric. The experimental results show that the proposed EAPB can effectively evaluate ontologies, especially those in the medical informatics field.ConclusionsWe leverage path information and textual information to enrich the network representational learning and aid in entropy computation. The analytics and assessments of semantic web can benefit from the structure information but also the text information. We believe that EAPB is helpful for managing ontology development and evaluation projects. Our results are reproducible and we will release the source code and ontology of this work after publication. (Source code and ontology: https://github.com/AnonymousResearcher1/ontologyEvaluate).

[1]  Yau-Hwang Kuo,et al.  Automated ontology construction for unstructured text documents , 2007, Data & Knowledge Engineering.

[2]  Ünal Sakoglu,et al.  Semantic requirements sharing approach to develop software systems using concept maps and information entropy: A Personal Health Information System example , 2014, Adv. Eng. Softw..

[3]  Tianyin Xu,et al.  Improving Prediction Accuracy of Matrix Factorization Based Network Coordinate Systems , 2010, 2010 Proceedings of 19th International Conference on Computer Communications and Networks.

[4]  Achim Rettinger,et al.  Linked data quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO , 2017, Semantic Web.

[5]  Christos Louis,et al.  IDODEN: An Ontology for Dengue , 2012, ICBO.

[6]  Deli Zhao,et al.  Network Representation Learning with Rich Text Information , 2015, IJCAI.

[7]  Aldo Gangemi,et al.  Modelling Ontology Evaluation and Validation , 2006, ESWC.

[8]  Luigi Iannone,et al.  Evaluating Ontology Modules Using an Entropy Inspired Metric , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[9]  Jian Su,et al.  Network-based analysis reveals distinct association patterns in a semantic MEDLINE-based drug-disease-gene network , 2014, Journal of Biomedical Semantics.

[10]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[11]  André Calero Valdez,et al.  Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data , 2016 .

[12]  G. Blelloch Introduction to Data Compression * , 2022 .

[13]  Jacques Calmet,et al.  From entropy to ontology , 2004 .

[14]  Kazuhiko Ohe,et al.  An ontological modeling approach for abnormal states and its application in the medical domain , 2014, Journal of Biomedical Semantics.

[15]  Michael Schroeder,et al.  A Maximum-Entropy approach for accurate document annotation in the biomedical domain , 2012, J. Biomed. Semant..

[16]  Khalid Sayood,et al.  Chapter 6 – Context-Based Compression , 2018 .

[17]  Christian Hempelmann,et al.  An entropy-based evaluation method for knowledge bases of medical information systems , 2016, Expert Syst. Appl..

[18]  Bowen Zhou,et al.  Attentive Pooling Networks , 2016, ArXiv.

[19]  M. Falagas,et al.  Multidrug-resistant, extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance. , 2012, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[20]  Hiroaki Kitano,et al.  Designing Neural Networks Using Genetic Algorithms with Graph Generation System , 1990, Complex Syst..

[21]  M. Plummer Some covering concepts in graphs , 1970 .

[22]  Ling Liu,et al.  Encyclopedia of Database Systems , 2009, Encyclopedia of Database Systems.

[23]  Barry Smith,et al.  Infectious Disease Ontology , 2010 .

[24]  Christos Louis,et al.  IDOMAL: the malaria ontology revisited , 2013, J. Biomed. Semant..

[25]  Yaliang Li,et al.  An ontology-driven clinical decision support system (IDDAP) for infectious disease diagnosis and antibiotic prescription , 2018, Artif. Intell. Medicine.

[26]  Zhiyuan Liu,et al.  Max-Margin DeepWalk: Discriminative Learning of Network Representation , 2016, IJCAI.

[27]  Christos Louis,et al.  Describing the Breakbone Fever: IDODEN, an Ontology for Dengue Fever , 2015, PLoS neglected tropical diseases.

[28]  Jens Lehmann,et al.  Quality assessment for Linked Data: A Survey , 2015, Semantic Web.

[29]  Nathalie Aussenac-Gilles,et al.  OQuaRE: A SQuaRE-based Approach for Evaluating the Quality of Ontologies , 2011, J. Res. Pract. Inf. Technol..