Semantic technologies based on logic programming

This thesis describes work on how Logic Programming (LP) can be used in various knowledge intensive applications. The contributions described in the thesis include algorithms and tools for efficient Description Logics reasoning, semantic comparison of source programs and semi-automated creation of metainformation. The link between these results is that (1) they all introduce novel semantic technologies, (2) they apply logic programming techniques in general and the Prolog LP language in particular, and (3) they have been actually implemented in the form of various applications. Semantic technologies form the basis of building software systems that work with the meaning of various kinds of information rather than with their textual form. The main idea is to reason about some sort of knowledge represented in a machine processable way. The most important parts of this process are knowledge representation and knowledge management. The dissertation deals with two groups of knowledge representation and management technologies: Logic Programming and Description Logic systems. Combination of these approaches creates interesting challenges: the idea of the so-called Description Logic Programming has been introduced in 2003. Several results in the present dissertation contribute to this area of semantic technologies. In the context of Description Logics we have addressed two issues. The first issue is about how to efficiently reason on Description Logic knowledge bases when huge amounts of data are present. We have created the theoretical basis of a new Description Logic reasoning system, called DLog, that transforms description logic axioms into a Prolog program. This transformation is done independently from the individuals: they are accessed dynamically during the Prolog execution of the generated program. For this we have created several algorithms and proved their useful properties. As a result, with our implementation, we reached better scalability and more efficient execution compared to that of the earlier approaches. The second issue addresses the question of how to use conceptual models for information integration. We have created a transformation framework together with a modelling methodology that uses Prolog for querying conceptual models formulated in Description Logics. Here we apply the closed world assumption as we argue it fits the context of information integration better. We demonstrate this approach by applying it for the SINTAGMA information integration system. The thesis also discusses two use cases that present the viability of Logic Programming for knowledge intensive applications. First, we deal with the issue of how to detect plagiarism between source programs. We have designed and implemented Match, a generic plagiarism detection framework that aims at the semantic comparison of source programs. The idea is to transform the source code into mathematical objects, use appropriate reduction and comparison methods on these, and interpret the results appropriately. We have been using the Prolog implementation of the system at the Budapest University of Technology and Economics to successfully detect plagiarism in homework assignments for the past eight years. Finally, we investigate the question of how to deduce meta-information in systems working with various kinds of documents. We have developed the SREngine framework, which manages a pool of generic objects together with their properties and supports Prolog based reasoning on these. The idea is to infer new properties about the objects, using their existing properties and a set of user-defined rules. These rules are given in a special logic based language developed for SREngine.

[1]  Abel K. Ubeku An Evaluation of the System , 1983 .

[2]  Georg Gottlob,et al.  Disjunctive datalog , 1997, TODS.

[3]  I. Horrocks,et al.  The Instance Store: DL Reasoning with Large Numbers of Individuals , 2004, Description Logics.

[4]  David A. Plaisted,et al.  A Structure-Preserving Clause Form Translation , 1986, J. Symb. Comput..

[5]  Michael J. Wise,et al.  YAP3: improved detection of similarities in computer program and other texts , 1996, SIGCSE '96.

[6]  Boris Motik,et al.  A Fuzzy Model for Representing Uncertain, Subjective, and Vague Temporal Knowledge in Ontologies , 2003, OTM.

[7]  Raf Venken A Prolog Meta-Interpreter for Partial Evaluation and its Application to Source to Source Transformation and Query-Optimisation , 1984, ECAI.

[8]  Enrico Pontelli,et al.  ASP-PROLOG: a system for reasoning about answer set programs in prolog , 2004, NMR.

[9]  H. Lan,et al.  SWRL : A semantic Web rule language combining OWL and ruleML , 2004 .

[10]  Rong Yang,et al.  The Andorra-I Preprocessor: Supporting Full Prolog on the Basic Andorra Model , 1991, ICLP.

[11]  Ian Horrocks,et al.  OILing the way to machine understandable bioinformatics resources , 2002, IEEE Transactions on Information Technology in Biomedicine.

[12]  Michael J. Muller,et al.  Requirements specification , 2002 .

[13]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[14]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[15]  Albert Rubio,et al.  Theorem Proving with Ordering and Equality Constrained Clauses , 1995, J. Symb. Comput..

[16]  Diego Calvanese,et al.  Description Logic Framework for Information Integration , 1998, KR.

[17]  ThrunSebastian,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000 .

[18]  Maurizio Lenzerini,et al.  Description Logics for Databases , 2003, Description Logic Handbook.

[19]  Péter Szeredi,et al.  Optimizing Queries in a Logic-based Information Integration System , 2007, ArXiv.

[20]  Johann-Christoph Freytag The Basic Principles of Query Optimization in Relational Database Management Systems , 1989, IFIP Congress.

[21]  Tony Clark,et al.  Object Modeling with the OCL: The Rationale behind the Object Constraint Language , 2002 .

[22]  Kendall Scott,et al.  UML distilled - applying the standard object modeling language , 1997 .

[23]  Seo-Young Noh An XML Plagiarism Detection Model for Procedural Programming Languages , 2003 .

[24]  Diego Calvanese,et al.  Survey on methods for query rewriting and query answering using views , 2001 .

[25]  Piero A. Bonatti Finitary Open Logic Program , 2003, Answer Set Programming.

[26]  Konstantinos Sagonas,et al.  XSB as an efficient deductive database engine , 1994, SIGMOD '94.

[27]  Péter Szeredi,et al.  Ontology Based Information Integration Using Logic Programming , 2007, ALPSWS.

[28]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[29]  Volker Haarslev,et al.  Extended Query Facilities for Racer and an Application to Software-Engineering Problems , 2004, Description Logics.

[30]  Peter Crowther,et al.  The DIG Description Logic Interface , 2003, Description Logics.

[31]  Allison Martin,et al.  Internet Plagiarism: A Teacher's Combat Guide , 2001 .

[32]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[33]  Péter Szeredi,et al.  A Generic framework for plagiarism detection in programs , .

[34]  Volker Haarslev,et al.  Optimization Techniques for Retrieving Resources Described in OWL/RDF Documents: First Results , 2004, KR.

[35]  Dieter Fensel,et al.  Semantic business process management: a vision towards using semantic Web services for business process management , 2005, IEEE International Conference on e-Business Engineering (ICEBE'05).

[36]  autoepistemic Zogic Logic programming and negation : a survey , 2001 .

[37]  Manuel V. Hermenegildo,et al.  Compile-Time Derivation of Variable Dependency Using Abstract Interpretation , 1992, J. Log. Program..

[38]  Hector Garcia-Molina,et al.  The SCAM Approach to Copy Detection in Digital Libraries , 1995, D Lib Mag..

[39]  Zsolt Nagy,et al.  Open World Reasoning in Datalog , 2005, ICLP.

[40]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[41]  Zsolt Zombori Efficient Two-Phase Data Reasoning for Description Logics , 2008, IFIP AI.

[42]  Werner Nutt,et al.  An Epistemic Operator for Description Logics , 1998, Artif. Intell..

[43]  György Surján,et al.  GALEN Based Formal Representation of ICD10 , 2005, MIE.

[44]  G. Whale Indentification of Program Similarity in Large Populations , 1990, Comput. J..

[45]  Seo-Young Noh,et al.  A Lightweight Program Similarity Detection Model using XML and Levenshtein Distance , 2006, FECS.

[46]  Vipul Kashyap,et al.  OBSERVER: An Approach for Query Processing in Global Information Systems Based on Interoperation Across Pre-Existing Ontologies , 2000, Distributed and Parallel Databases.

[47]  Enrico Franconi Natural Language Processing , 2003, Description Logic Handbook.

[48]  Jeff Heflin,et al.  An Evaluation of Knowledge Base Systems for Large OWL Datasets , 2004, SEMWEB.

[49]  Mark E. Stickel A Prolog Technology Theorem Prover: A New Exposition and Implementation in Prolog , 1990, DISCO.

[50]  Boris Motik,et al.  Data Complexity of Reasoning in Very Expressive Description Logics , 2005, IJCAI.

[51]  W H Walker THE TECHNOLOGY PLAN. , 1920, Science.

[52]  Nicholas Gibbins Web Ontology Language , 2009, Encyclopedia of Database Systems.

[53]  Jennifer Widom,et al.  The TSIMMIS Project: Integration of Heterogeneous Information Sources , 1994, IPSJ.

[54]  Letizia Tanca,et al.  Logic Programming and Databases , 1990, Surveys in Computer Science.

[55]  Udi Manber,et al.  Deducing Similarities in Java Sources from Bytecodes , 1998, USENIX Annual Technical Conference.

[56]  A. Jovanovic,et al.  A new algorithm for solving the tree isomorphism problem , 2005, Computing.

[57]  Peter F. Patel-Schneider,et al.  DLP System Description , 1998, Description Logics.

[58]  Maxime Crochemore,et al.  A fast and practical bit-vector algorithm for the Longest Common Subsequence problem , 2001, Inf. Process. Lett..

[59]  Ian Horrocks,et al.  Conjunctive Query Answering for the Description Logic SHIQ , 2007, IJCAI.

[60]  Leon Sterling,et al.  The Art of Prolog - Advanced Programming Techniques , 1986 .

[61]  Yixin Chen,et al.  Support vector learning for fuzzy rule-based classification systems , 2003, IEEE Trans. Fuzzy Syst..

[62]  Ulf Nilsson,et al.  Logic, programming and Prolog , 1990 .

[63]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[64]  Linh Anh Nguyen A Fixpoint Semantics and an SLD-Resolution Calculus for Modal Logic Programs , 2003, Fundam. Informaticae.

[65]  James O. Hamblen,et al.  Computer algorithms for plagiarism detection , 1989 .

[66]  William F. Clocksin,et al.  Programming in Prolog , 1981, Springer Berlin Heidelberg.

[67]  Teodor C. Przymusinski Well-founded and stationary models of logic programs , 2005, Annals of Mathematics and Artificial Intelligence.

[68]  Markus Voelter,et al.  State of the Art , 1997, Pediatric Research.

[69]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[70]  Ian Horrocks Reasoning with Expressive Description Logics: Theory and Practice , 2002, CADE.

[71]  Daniel Shawcross Wilkerson,et al.  Winnowing: local algorithms for document fingerprinting , 2003, SIGMOD '03.

[72]  Paul Heckel,et al.  A technique for isolating differences between files , 1978, CACM.

[73]  Ian Horrocks,et al.  OWL Web Ontology Language Reference-W3C Recommen-dation , 2004 .

[74]  Frank van Harmelen,et al.  Web Ontology Language , 2004 .

[75]  Ian Horrocks,et al.  Description logic programs: combining logic programs with description logic , 2003, WWW '03.

[76]  Péter Szeredi,et al.  Translating Description Logic Queries to Prolog , 2006, PADL.

[77]  Daniel R. Miller,et al.  Final report , 2000 .

[78]  Gerhard Lakemeyer,et al.  The logic of knowledge bases , 2000 .

[79]  Ian Horrocks,et al.  Reasoning Support for Expressive Ontology Languages Using a Theorem Prover , 2006, FoIKS.

[81]  Peter J. Stuckey,et al.  Constraint-based mode analysis of mercury , 2002, PPDP '02.

[82]  Robert A. Kowalski,et al.  Linear Resolution with Selection Function , 1971, Artif. Intell..

[83]  Péter Szeredi,et al.  Efficient description logic reasoning in Prolog: The DLog system , 2009, Theory and Practice of Logic Programming.

[84]  David Eppstein,et al.  The Polyhedral Approach to the Maximum Planar Subgraph Problem: New Chances for Related Problems , 1994, GD.

[85]  Brenda S. Baker,et al.  A theory of parameterized pattern matching: algorithms and applications , 1993, STOC.

[86]  Boris Motik,et al.  Reasoning for Description Logics aroundSHIQ in a Resolution Framework , 2004 .

[87]  O. Ridoux,et al.  Introduction to logical information systems , 2004, Inf. Process. Manag..

[88]  Lucila Ohno-Machado,et al.  Medical Applications , 2020, Believing Your Ears: Examining Auditory Illusions.

[89]  Vipul Kashyap,et al.  Observer: an approach for query processing in global information systems based on interoperation across pre-existing ontologies , 1996, Proceedings First IFCIS International Conference on Cooperative Information Systems.

[90]  Gergely Lukácsy Description Logic Reasoning in Prolog , 2006, ICLP.

[91]  Divesh Srivastava,et al.  The Information Manifold , 1995 .

[92]  Péter Szeredi,et al.  Plagiarism Detection in Source Programs Using Structural Similarities , 2009, Acta Cybern..

[93]  Ian Horrocks,et al.  How to Decide Query Containment Under Constraints Using a Description Logic , 2000, LPAR.

[94]  S. K. Robinson,et al.  An empirical approach for detecting program similarity and plagiarism within a university programming environment , 1987 .

[95]  Xiangmin Zhang,et al.  Rule-based word clustering for document metadata extraction , 2005, SAC '05.

[96]  Patrick E. O'Neil,et al.  System Description , 2005 .

[97]  Yarden Katz,et al.  Pellet: A practical OWL-DL reasoner , 2007, J. Web Semant..

[98]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[99]  Boris Motik,et al.  Reasoning in description logics using resolution and deductive databases , 2006 .

[100]  Krzysztof R. Apt,et al.  Logic Programming and Negation: A Survey , 1994, The Journal of Logic Programming.

[101]  Péter Szeredi,et al.  Information Integration through Reasoning on Meta-data , 2002 .

[102]  Stefan Decker,et al.  TRIPLE - A Query, Inference, and Transformation Language for the Semantic Web , 2002, SEMWEB.

[103]  Mark E. Stickel,et al.  A Prolog Technology Theorem Prover: A New Exposition and Implementation in Prolog , 1990, Theor. Comput. Sci..

[104]  Hans Tompits,et al.  dlvhex: A System for Integrating Multiple Semantics in an Answer-Set Programming Framework , 2006, WLP.

[105]  William Davis The requirements specification , 1998 .

[106]  Péter Szeredi,et al.  A Logic-Based System for Application Integration , 2002, ICLP.

[107]  Jeff Z. Pan,et al.  Rules and Rule Markup Languages for the Semantic Web , 2003, Lecture Notes in Computer Science.

[108]  Péter Szeredi,et al.  Towards automatic semantic integration , 2007, IESA.

[109]  Robert A. Kowalski,et al.  The Semantics of Predicate Logic as a Programming Language , 1976, JACM.

[110]  Vikraman Arvind,et al.  Graph Isomorphism is in SPP , 2006, Inf. Comput..

[111]  Zhuang Zhen-quan A design of query answering processor for data integration based on logic , 2006 .