Reasoning-Supported Quality Assurance for Knowledge Bases

The increasing application of ontology reuse and automated knowledge acquisition tools in ontology engineering brings about a shift of development efforts from knowledge modeling towards quality assurance. When ontology reuse or automatic knowledge acquisition are applied, accuracy and conciseness are the two most typical quality problems. Yet, despite the high practical importance, there has been a substantial lack of support for essential quality assurance activities concerning these two quality dimensions. In this thesis, we make a significant step forward in ontology engineering by developing a support for two such essential quality assurance activities. We develop a sophisticated framework and the corresponding tool support for partially automating the inspection of ontologies with respect to accuracy. This is a significant contribution in the field of ontology engineering, since manual inspection of ontologies, not replaceable by ontology debugging or constraint formalization in professional ontology engineering projects, is one of the most costly alternatives in quality assurance due to the high amount of required user interaction. The framework is based on the assumption that the deductive closure of the correct axioms must be disjoint from the set of incorrect axioms, which holds for all standardized ontology languages with formal semantics. Given this general assumption, we employ reasoning in order to reduce the number of decisions that have to be taken by a domain expert in order to complete the inspection. Due to its generality, the framework allows for a maximum automation achieved by reasoning for a wide range of ontology modeling languages and for a flexible choice of initial constraints applying to the ontology. Since the order of inspection has an impact on the effectiveness of the reasoningbased support, we further propose and compare various axiom ranking techniques used to determine a beneficial order of inspection. These ranking heuristics are based on the expected accuracy ratio of an ontology and aim at choosing axioms with the highest number of subsequent automatic evaluations. In order to deliberate the user from having to provide an estimate of the accuracy ratio in advance, we show that this estimate can effectively be learned on-the-fly over the course

[1]  Amit P. Sheth,et al.  OntoQA: Metric-Based Ontology Quality Analysis , 2005 .

[2]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[3]  Carsten Lutz,et al.  Conservative Extensions in Expressive Description Logics , 2007, IJCAI.

[4]  Besiki Stvilia,et al.  Prioritization of data quality dimensions and skills requirements in genome annotation work , 2012, J. Assoc. Inf. Sci. Technol..

[5]  Sebastian Rudolph,et al.  ExpExpExplosion: Uniform Interpolation in General EL Terminologies , 2012, ECAI.

[6]  Jeff Z. Pan,et al.  Uniform Interpolation for ALC Revisited , 2009, Australasian Conference on Artificial Intelligence.

[7]  Vijayan Sugumaran,et al.  A semiotic metrics suite for assessing the quality of ontologies , 2005, Data Knowl. Eng..

[8]  Luciano Serafini,et al.  DRAGO: Distributed Reasoning Architecture for the Semantic Web , 2005, ESWC.

[9]  Boris Motik,et al.  Closed World Reasoning in the Semantic Web through Epistemic Operators , 2005, OWLED.

[10]  Gerhard Friedrich,et al.  Interactive ontology debugging: Two query strategies for efficient fault localization☆ , 2011, J. Web Semant..

[11]  Guilin Qi,et al.  A Survey of Revision Approaches in Description Logics , 2008, Description Logics.

[12]  Stuart E. Middleton,et al.  Ontology-based Recommender Systems , 2004, Handbook on Ontologies.

[13]  Martin Hepp,et al.  Towards a vocabulary for data quality management in semantic web architectures , 2011, LWDM '11.

[14]  Robert Stevens,et al.  Application of Ontologies in Bioinformatics , 2009, Handbook on Ontologies.

[15]  Volker Tresp,et al.  Mining the Semantic Web Statistical Learning for Next Generation Knowledge Bases , 2012 .

[16]  Steffen Staab,et al.  Measuring Similarity between Ontologies , 2002, EKAW.

[17]  Juhana Salim,et al.  OntoAbsolute as a ontology evaluation methodology in analysis of the structural domains in upper, middle and lower level ontologies , 2011, 2011 International Conference on Semantic Technology and Information Retrieval.

[18]  Boris Motik,et al.  Exploiting Partial Information in Taxonomy Construction , 2009, Description Logics.

[19]  Diego Calvanese,et al.  DL-Lite in the Light of First-Order Logic , 2007, AAAI.

[20]  H. Lan,et al.  SWRL : A semantic Web rule language combining OWL and ruleML , 2004 .

[21]  Guilin Qi,et al.  Approaches to Inconsistency Handling in Description-Logic Based Ontologies , 2007, OTM Workshops.

[22]  Aldo Gangemi,et al.  The GALEN CORE Model Schemata for Anatomy: Towards a Re-usable Application-Independent Model of Medical Concepts , 2008 .

[23]  Alan L. Rector,et al.  Validating clinical terminology structures: integration and cross-validation of Read Thesaurus and GALEN , 1998, AMIA.

[24]  Willem Conradie,et al.  Definitorially Complete Description Logics , 2006, KR.

[25]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[26]  Patricia C. Babbitt,et al.  Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies , 2009, PLoS Comput. Biol..

[27]  Ian Horrocks,et al.  Building Ontologies Collaboratively Using ContentCVS , 2009, Description Logics.

[28]  Gerhard Weikum,et al.  Transductive Learning for Text Classification Using Explicit Knowledge Models , 2006, PKDD.

[29]  Yu Rang Park,et al.  GOChase-II: correcting semantic inconsistencies from Gene Ontology-based annotations for gene products , 2011, BMC Bioinformatics.

[30]  Boris Konev,et al.  Semantic Modularity and Module Extraction in Description Logics , 2008, ECAI.

[31]  Lina Zhou,et al.  A Framework for Ontology Evaluation , 2009, WEB.

[32]  Raúl García-Castro,et al.  Towards a Quality Model for Semantic Technologies , 2011, ICCSA.

[33]  Francesco M. Donini,et al.  Exptime Tableaux for ALC , 2000, Description Logics.

[34]  Nicola Guarino,et al.  Sweetening Ontologies with DOLCE , 2002, EKAW.

[35]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[36]  Jian Zhang,et al.  The Protein Ontology: a structured representation of protein forms and complexes , 2010, Nucleic Acids Res..

[37]  Ghassan Beydoun,et al.  How do we measure and improve the quality of a hierarchical ontology? , 2011, J. Syst. Softw..

[38]  K. Bretonnel Cohen,et al.  Manual curation is not sufficient for annotation of genomic databases , 2007, ISMB/ECCB.

[39]  Frank Wolter,et al.  Logic-based ontology comparison and module extraction, with an application to DL-Lite , 2010, Artif. Intell..

[40]  Ajit Parab,et al.  Ontology based expert systems – replication of human learning , 2011 .

[41]  Carsten Lutz,et al.  Foundations for Uniform Interpolation and Forgetting in Expressive Description Logics , 2011, IJCAI.

[42]  Michael Elhadad,et al.  Automatic Evaluation of Search Ontologies in the Entertainment Domain Using Natural Language Processing , 2011 .

[43]  Jinzhong Xu,et al.  Domain Ontology Based Automatic Question Answering , 2009, 2009 International Conference on Computer Engineering and Technology.

[44]  Esteban Zimányi,et al.  Advocacy for External Quality in GIS , 2011, GeoS.

[45]  Diego Calvanese,et al.  DL-Lite: Tractable Description Logics for Ontologies , 2005, AAAI.

[46]  Krzysztof Janowicz,et al.  Similarity as a Quality Indicator in Ontology Engineering , 2008, FOIS.

[47]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[48]  Kent A. Spackman,et al.  SNOMED RT: a reference terminology for health care , 1997, AMIA.

[49]  Peter Z. Yeh,et al.  Accelerating the Discovery of Data Quality Rules: A Case Study , 2011, IAAI.

[50]  Robert G. Raskin,et al.  Knowledge representation in the semantic web for Earth and environmental terminology (SWEET) , 2005, Comput. Geosci..

[51]  Bernardo Cuenca Grau,et al.  OWL 2 Web Ontology Language: Profiles , 2009 .

[52]  Carsten Lutz,et al.  Deciding inseparability and conservative extensions in the description logic EL , 2010, J. Symb. Comput..

[53]  Johanna Völker,et al.  Integrated Metamodeling and Diagnosis in OWL 2 , 2010, SEMWEB.

[54]  Nuno Silva,et al.  A Generic Recommendation System based on Inference and Combination of OWL-DL Ontologies , 2011 .

[55]  Aldo Gangemi,et al.  Modelling Ontology Evaluation and Validation , 2006, ESWC.

[56]  Stephan Bloehdorn,et al.  Ontology-Based Question Answering for Digital Libraries , 2007, ECDL.

[57]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[58]  Frank Wolter,et al.  Can You Tell the Difference Between DL-Lite Ontologies? , 2008, KR.

[59]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[60]  Carsten Lutz,et al.  An Automata-Theoretic Approach to Uniform Interpolation and Approximation in the Description Logic EL , 2012, KR.

[61]  Jeff Z. Pan,et al.  Forgetting Concepts in DL-Lite , 2008, ESWC.

[62]  Steffen Stadtmüller,et al.  RaDON - Repair and Diagnosis in Ontology Networks , 2009, ESWC.

[63]  Clement T. Yu,et al.  An effective approach to document retrieval via utilizing WordNet and recognizing phrases , 2004, SIGIR '04.

[64]  Ian Horrocks,et al.  The Even More Irresistible SROIQ , 2006, KR.

[65]  Gary D Bader,et al.  BMC Biology BioMed Central , 2007 .

[66]  Sabrina Molinaro,et al.  Hypothesis Generation and Evaluation in Clinical Trial Design , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[67]  Christine Golbreich,et al.  The Foundational Model of Anatomy in OWL: Experience and Perspectives , 2006, OWLED.

[68]  John M. Hancock,et al.  Using ontologies to describe mouse phenotypes , 2004, Genome Biology.

[69]  Alan L. Rector,et al.  Web ontology segmentation: analysis, classification and use , 2006, WWW '06.

[70]  Tatiana Gavrilova,et al.  To a method of evaluating ontologies , 2011 .

[71]  Oliver Thomas,et al.  A Query-Driven Approach for Checking the Semantic Correctness of Ontology-Based Process Representations , 2011, BIS.

[72]  S. Stich,et al.  The cognitive basis of science , 2002 .

[73]  Boris Konev,et al.  Forgetting and Uniform Interpolation in Large-Scale Description Logic Terminologies , 2009, IJCAI.

[74]  Birte Glimm,et al.  Hitting the Sweetspot: Economic Rewriting of Knowledge Bases , 2012, SEMWEB.

[75]  Ian Horrocks,et al.  Ontology Integration Using Mappings: Towards Getting the Right Logical Consequences , 2009, ESWC.

[76]  Boris Konev,et al.  Formal Properties of Modularisation , 2009, Modular Ontologies.

[77]  Óscar Corcho,et al.  Pattern-based OWL Ontology Debugging Guidelines , 2009, WOP.

[78]  Jianfeng Du,et al.  Computing minimum cost diagnoses to repair populated DL-based ontologies , 2008, WWW.

[79]  Bijan Parsia,et al.  Modularity and Web Ontologies , 2006, KR.

[80]  Werner Ceusters,et al.  Ontology-Based Error Detection in SNOMED-CT® , 2004, MedInfo.

[81]  Chengming Zhang,et al.  A method of ontology evaluation based on coverage, cohesion and coupling , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[82]  Jeff Z. Pan,et al.  Concept and Role Forgetting in ALC Ontologies , 2009, ISWC 2009.

[83]  Asunción Gómez-Pérez,et al.  Evaluation of ontologies , 2001, International Journal of Intelligent Systems.

[84]  Sebastian Rudolph,et al.  Reasoning-Supported Interactive Revision of Knowledge Bases , 2011, IJCAI.

[85]  Ian Horrocks,et al.  Just the right amount: extracting modules from ontologies , 2007, WWW '07.

[86]  Sherri de Coronado,et al.  NCI Thesaurus: A semantic model integrating cancer-related clinical and molecular information , 2007, J. Biomed. Informatics.

[87]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[88]  Christopher Brewster,et al.  Book Review: Ontology Learning from Text: Methods, Evaluation and Applications, edited by Paul Buitelaar, Philipp Cimiano and Bernado Magnini , 2005, CL.

[89]  Franz Baader,et al.  Pushing the EL Envelope , 2005, IJCAI.

[90]  Kewen Wang,et al.  Tableau-based Forgetting in ALC Ontologies , 2010, ECAI 2010.

[91]  Asunción Gómez-Pérez,et al.  ONTOMETRIC: A Method to Choose the Appropriate Ontology , 2004, J. Database Manag..

[92]  José Luis Vicedo González,et al.  Addressing ontology-based question answering with collections of user queries , 2009, Inf. Process. Manag..

[93]  Yacine Ouzrout,et al.  An Ontology-based Knowledge Management System for Industry Clusters , 2008, ArXiv.

[94]  Robert Stevens,et al.  Quality assurance of the content of a large DL-based terminology using mixed lexical and semantic criteria: experience with SNOMED CT , 2011, K-CAP '11.

[95]  Trong Hai Duong,et al.  An Effective Method for Ontology Integration by Propagating Inconsistency , 2010, 2010 Second International Conference on Knowledge and Systems Engineering.

[96]  Yorick Wilks,et al.  Data Driven Ontology Evaluation , 2004, LREC.

[97]  Nicola Guarino,et al.  Evaluating ontological decisions with OntoClean , 2002, CACM.

[98]  Jens Lehmann,et al.  Template-based question answering over RDF data , 2012, WWW.

[99]  Heiner Stuckenschmidt,et al.  Supporting Manual Mapping Revision using Logical Reasoning , 2008, AAAI.

[100]  Deborah L. McGuinness,et al.  Ontology-supported scientific data frameworks: The Virtual Solar-Terrestrial Observatory experience , 2009, Comput. Geosci..

[101]  Lawrence Hunter,et al.  Knowledge‐Driven Approaches to Genome‐Scale Analysis , 2010 .

[102]  Stefano Spaccapietra,et al.  Modular Ontologies: Concepts, Theories and Techniques for Knowledge Modularization , 2009, Modular Ontologies.

[103]  Ian Horrocks,et al.  A Logical Framework for Modularity of Ontologies , 2007, IJCAI.

[104]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[105]  Paul N. Schofield,et al.  Improving ontologies by automatic reasoning and evaluation of logical definitions , 2011, BMC Bioinformatics.

[106]  Ian Horrocks,et al.  Extracting Modules from Ontologies: A Logic-based Approach , 2009, OWLED.

[107]  Aldo Gangemi,et al.  Unit Tests for Ontologies , 2006, OTM Workshops.

[108]  E. Ferlie,et al.  Networks, Organizational Learning and Knowledge Management: NHS Cancer Networks , 2006 .

[109]  Valerie V. Cross,et al.  Using semantic similarity in ontology alignment , 2011, OM.

[110]  Luis Alfonso Ureña López,et al.  Query expansion with a medical ontology to improve a multimodal information retrieval system , 2009, Comput. Biol. Medicine.

[111]  Heiner Stuckenschmidt,et al.  Repairing Ontology Mappings , 2007, AAAI.

[112]  Gilbert Paquette,et al.  Managing ontology changes on the semantic Web , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[113]  Sang-Jo Lee,et al.  Automatic classification of Web pages based on the concept of domain ontology , 2005, 12th Asia-Pacific Software Engineering Conference (APSEC'05).

[114]  Ismailcem Budak Arpinar,et al.  Ontology Quality by Detection of Conflicts in Metadata , 2006, EON@WWW.

[115]  Adrian Paschke,et al.  RuleML 1.0: The Overarching Specification of Web Rules , 2010, RuleML.

[116]  Stefan Schlobach,et al.  Non-Standard Reasoning Services for the Debugging of Description Logic Terminologies , 2003, IJCAI.

[117]  Sidi Mohamed Benslimane,et al.  FOEval: Full ontology evaluation , 2011, 2011 7th International Conference on Natural Language Processing and Knowledge Engineering.

[118]  Enrico Motta,et al.  AQUA: An Ontology-Driven Question Answering System , 2003, New Directions in Question Answering.

[119]  Frada Burstein,et al.  Using Machine Learning to Support Resource Quality Assessment: An Adaptive Attribute-Based Approach for Health Information Portals , 2011, DASFAA Workshops.

[120]  Mark A. Musen,et al.  The PROMPT suite: interactive tools for ontology merging and mapping , 2003, Int. J. Hum. Comput. Stud..

[121]  Catherine Legg,et al.  Bill Gates is not a Parking Meter: Philosophical Quality Control in Automated Ontology-building , 2012 .

[122]  York Sure-Vetter,et al.  Automatic Evaluation of Ontologies (AEON) , 2005, SEMWEB.

[123]  Gerhard Weikum,et al.  The SphereSearch Engine for Unified Ranked Retrieval of Heterogeneous XML and Web Documents , 2005, VLDB.

[124]  Roberto Rosso,et al.  Can an Ad-hoc ontology Beat a Medical Search Engine? The Chronious Search Engine case , 2012, eTELEMED 2012.

[125]  Boris Konev,et al.  Decomposing Description Logic Ontologies , 2010, KR.

[126]  Steven Schockaert,et al.  An Inconsistency-Tolerant Approach to Information Merging Based on Proposition Relaxation , 2010, AAAI.

[127]  Michael Darsow,et al.  ChEBI: a database and ontology for chemical entities of biological interest , 2007, Nucleic Acids Res..

[128]  Enrico Motta,et al.  A framework for evaluating semantic metadata , 2007, K-CAP '07.

[129]  Heiner Stuckenschmidt,et al.  Ontology Alignment Evaluation Initiative: Six Years of Experience , 2011, J. Data Semant..

[130]  Carsten Lutz,et al.  Did I Damage My Ontology? A Case for Conservative Extensions in Description Logics , 2006, KR.

[131]  Franz Baader,et al.  Small is Again Beautiful in Description Logics , 2010, KI - Künstliche Intelligenz.

[132]  Alessio Bechini,et al.  Enabling ontology-based document classification and management in ebXML registries , 2008, SAC '08.

[133]  Kei-Hoi Cheung,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.

[134]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[135]  Sebastian Rudolph,et al.  Wheat and Chaff - Practically Feasible Interactive Ontology Revision , 2011, International Semantic Web Conference.

[136]  Gang Feng,et al.  Disease Ontology: a backbone for disease semantic integration , 2011, Nucleic Acids Res..

[137]  K. Bretonnel Cohen,et al.  Ontology quality assurance through analysis of term transformations , 2009, Bioinform..

[138]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[139]  Markus Krötzsch,et al.  Semantic MediaWiki , 2006, Foundations for the Web of Information and Services.

[140]  Alexander Feldman,et al.  FRACTAL: efficient fault isolation using active testing , 2009, IJCAI 2009.

[141]  Enrique Alfonseca,et al.  Acquisition of instance attributes via labeled and related instances , 2010, SIGIR.

[142]  P. Haase An Analysis of Approaches to Resolving Inconsistencies in DL-based Ontologies , 2007 .

[143]  Mathieu d'Aquin,et al.  Extending Open Rating Systems for Ontology Ranking and Reuse , 2010, EKAW.

[144]  Ismailcem Budak Arpinar,et al.  SemanticQA: web-based ontology-driven question answering , 2009, SAC '09.