Information integration using contextual knowledge and ontology merging

With the advances in telecommunications, and the introduction of the Internet, information systems achieved physical connectivity, but have yet to establish logical connectivity. Lack of logical connectivity is often inviting disaster as in the case of Mars Orbiter, which was lost because one team used metric units, the other English while exchanging a critical maneuver data. In this Thesis, we focus on the two intertwined sub problems of logical connectivity, namely data extraction and data interpretation in the domain of heterogeneous information systems. The first challenge, data extraction, is about making it possible to easily exchange data among semi-structured and structured information systems. We describe the design and implementation of a general purpose, regular expression based Cameleon wrapper engine with an integrated capabilities-aware planner/optimizer/executioner. The second challenge, data interpretation, deals with the existence of heterogeneous contexts, whereby each source of information and potential receiver of that information may operate with a different context, leading to large-scale semantic heterogeneity. We extend the existing formalization of the COIN framework with new logical formalisms and features to handle larger set of heterogeneities between data sources. This extension, named Extended Context Interchange (ECOIN), is motivated by our analysis of financial information systems that indicates that there are three fundamental types of heterogeneities in data sources: contextual, ontological, and temporal. While COIN framework was able to deal with the contextual heterogeneities, ECOIN framework expands the scope to include ontological heterogeneities as well. In particular, we are able to deal with equational ontological conflicts (EOC), which refer to the heterogeneity in the way data items are calculated from other data items in terms of definitional equations. ECOIN provides a context-based solution to the EOC problem based on a novel approach that integrates abductive reasoning and symbolic equation solving techniques in a unified framework. Furthermore, we address the merging of independently built ECOIN applications, which involves merging disparate ontologies and contextual knowledge. The relationship between ECOIN and the Semantic Web is also discussed. Finally, we demonstrate the feasibility and features of our integration approach with a prototype implementation that provides mediated access to heterogeneous information systems. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

[1]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[2]  Zoé Lacroix Object Views through Search Views of Web Datasources , 1999, ER.

[3]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[4]  Fausto Giunchiglia,et al.  Theories and uses of context in knowledge representation and reasoning , 2003 .

[5]  Thom W. Frühwirth,et al.  Theory and Practice of Constraint Handling Rules , 1998, J. Log. Program..

[6]  Varol Akman,et al.  Rethinking context as a social construct , 2000 .

[7]  Marta Jakóbisiak,et al.  Programming the Web : design and implementation of a multidatabase browser , 1996 .

[8]  W. Litwin,et al.  Dynamic attributes in the multidatabase system MRPSM , 1986, 1986 IEEE Second International Conference on Data Engineering.

[9]  Maurizio Lenzerini,et al.  Data Integration Is Harder than You Thought , 2001, CoopIS.

[10]  Luciano Serafini,et al.  On the Difference between Bridge Rules and Lifting Axioms , 2003, CONTEXT.

[11]  Heiner Stuckenschmidt,et al.  Ontology-Based Integration of Information - A Survey of Existing Approaches , 2001, OIS@IJCAI.

[12]  Michael R. Genesereth,et al.  Answering recursive queries using views , 1997, PODS '97.

[13]  Mary Roth,et al.  Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources , 1997, VLDB.

[14]  Stuart E. Madnick,et al.  Representing and reasoning about semantic conflicts in heterogeneous information systems , 1997 .

[15]  Craig A. Knoblock,et al.  A Dataflow Approach to Agent-based Information Management , 2000 .

[16]  Chris Welty,et al.  FOIS introduction: Ontology---towards a new synthesis , 2001, FOIS.

[17]  Adam Pease,et al.  Towards a standard upper ontology , 2001, FOIS.

[18]  Hans Chalupsky,et al.  OntoMorph: A Translation System for Symbolic Knowledge , 2000, KR.

[19]  Alon Y. Halevy,et al.  Theory of answering queries using views , 2000, SGMD.

[20]  Christian Convey,et al.  Data Integration Services , 2001 .

[21]  Nicola Guarino,et al.  Ontologies and Knowledge Bases. Towards a Terminological Clarification , 1995 .

[22]  Rose Dieng,et al.  Knowledge Engineering and Knowledge Management Methods, Models, and Tools , 2002, Lecture Notes in Computer Science.

[23]  Witold Litwin,et al.  O*SQL: A Language for Object Oriented Multidatabase Interoperability , 1992, DS-5.

[24]  Patrick Valduriez,et al.  Scaling Access to Heterogeneous Data Sources with DISCO , 1998, IEEE Trans. Knowl. Data Eng..

[25]  Stuart E. Madnick,et al.  Financial Information Integration in the Presence of Equational Ontological Conflicts , 2002 .

[26]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[27]  Craig A. Knoblock,et al.  STALKER: Learning Extraction Rules for Semistructured, Web-based Information Sources * , 1998 .

[28]  Philip W. Lee,et al.  Metadata representation and management for context mediation , 2003 .

[29]  Stuart E. Madnick,et al.  The Camélón Web Wrapper Engine , 2000 .

[30]  HalevyAlon,et al.  MiniCon: A scalable algorithm for answering queries using views , 2001, VLDB 2001.

[31]  Ian A. Mason,et al.  Propositional Logic of Context , 1993, AAAI.

[32]  Stuart E. Madnick,et al.  The Camaleon Web Wrapper Engine , 2000, TES.

[33]  Yuri Breitbart,et al.  ADDS - Heterogeneous Distributed Database System , 1984, DDSS.

[34]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[35]  Maria-Esther Vidal,et al.  Wrapper generation for Web accessible data sources , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).

[36]  Fausto Giunchiglia,et al.  Local Models Semantics, or Contextual Reasoning = Locality + Compatibility , 1998, KR.

[37]  Weimin Du,et al.  The Pegasus heterogeneous multidatabase system , 1991, Computer.

[38]  Stéphane Bressan,et al.  Extraction and integration of data from semi-structured documents into business applications , 2003 .

[39]  Alan Bundy,et al.  Computational Logic: Logic Programming and Beyond , 2002 .

[40]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.

[41]  Stuart E. Madnick,et al.  A Metadata Approach to Resolving Semantic Conflicts , 2011, VLDB.

[42]  eva Kühn,et al.  VIP-MDBS: A Logic Multidatabase System , 1988, Proceedings [1988] International Symposium on Databases in Parallel and Distributed Systems.

[43]  John W. Lloyd,et al.  Foundations of Logic Programming, 1st Edition , 1984 .

[44]  Amit P. Sheth DB-IS research for Semantic Web and enterprises , 2002 .

[45]  John McCarthy,et al.  Generality in artificial intelligence , 1987, Resonance.

[46]  Krzysztof R. Apt,et al.  Logic Programming , 1990, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[47]  Leon Sterling,et al.  The Art of Prolog , 1987, IEEE Expert.

[48]  Dieter Fensel,et al.  Ontobroker in a Nutshell , 1998, ECDL.

[49]  Varol Akman,et al.  The Use of Situation Theory in Context Modeling , 1997, Comput. Intell..

[50]  Antonis C. Kakas,et al.  Abduction in Logic Programming , 2002, Computational Logic: Logic Programming and Beyond.

[51]  Jacob L. Lee,et al.  Integrating information from disparate contexts : a theory of semantic interoperability , 1996 .

[52]  Jennifer Widom,et al.  Integrating and Accessing Heterogeneous Information Sources in TSIMMIS , 1994 .

[53]  autoepistemic Zogic Logic programming and negation : a survey , 2001 .

[54]  John F. Sowa,et al.  Peircean Foundations for a Theory of Context , 1997, ICCS.

[55]  Christine Collet,et al.  Resource integration using a large knowledge base in Carnot , 1991, Computer.

[56]  Robert A. Kowalski,et al.  Predicate Logic as Programming Language , 1974, IFIP Congress.

[57]  Ricardo S. Ambrose A lightweight multi-database execution engine , 1998 .

[58]  Hans-Jürgen Bürckert,et al.  A Resolution Principle for Constrained Logics , 1994, Artif. Intell..

[59]  Erich J. Neuhold,et al.  Jedi: extracting and synthesizing information from the Web , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).

[60]  Y HalevyAlon Theory of answering queries using views , 2000 .

[61]  Charles Axel Allen,et al.  WIDL, Application Integration with XML , 1997, World Wide Web journal.

[62]  W. Litwin,et al.  An overview of the multi-database manipulation language MDSL , 1987, Proceedings of the IEEE.

[63]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[64]  Krzysztof R. Apt,et al.  Logic Programming and Negation: A Survey , 1994, The Journal of Logic Programming.

[65]  James A. Hendler,et al.  Ontology-based Web agents , 1997, AGENTS '97.

[66]  Keith L. Clark,et al.  Negation as Failure , 1987, Logic and Data Bases.

[67]  Nicola Guarino,et al.  Formal ontology, conceptual analysis and knowledge representation , 1995, Int. J. Hum. Comput. Stud..

[68]  Clement T. Yu,et al.  Report on the workshop on heterogenous database systems held at Northwestern University Evanston, Illinois, December 11-13, 1989 sponsored by NSF , 1990, SGMD.

[69]  Deborah L. McGuinness,et al.  An Environment for Merging and Testing Large Ontologies , 2000, KR.

[70]  Harold Boley,et al.  Relationships between Logic Programming and RDF , 2000, PRICAI Workshops.

[71]  Michael Uschold,et al.  Ontologies: principles, methods and applications , 1996, The Knowledge Engineering Review.

[72]  RosenthalArnon,et al.  Using semantic values to facilitate interoperability among heterogeneous information systems , 1994 .

[73]  Nicola Guarino,et al.  A Formal Ontology of Properties , 2000, EKAW.

[74]  Luciano Serafini,et al.  Information Integration for Electronic Commerce , 1998, AMET.

[75]  Ahmed K. Elmagarmid,et al.  Multidatabase Transaction and Query Processing in Logic , 1992, Database Transaction Models for Advanced Applications.

[76]  M. Bilal Kaleem,et al.  CLAMP : application merging in the ECOIN context mediation system using the context linking approach , 2003 .

[77]  Antonis C. Kakas,et al.  Integrating Abductive and Constraint Logic Programming , 1995 .

[78]  Hector Garcia-Molina,et al.  Extracting Semistructured Information from the Web. , 1997 .

[79]  Terry A. Landers,et al.  An Overview of MULTIBASE , 1986, DDB.

[80]  Georg Gottlob,et al.  Complexity and expressive power of logic programming , 2001, CSUR.

[81]  Alon Y. Halevy,et al.  MiniCon: A scalable algorithm for answering queries using views , 2000, The VLDB Journal.

[82]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[83]  Vipul Kashyap,et al.  Semantic and schematic similarities between database objects: a context-based approach , 1996, The VLDB Journal.

[84]  J. McCarthy,et al.  Formalizing Context (Expanded Notes) , 1994 .

[85]  Paolo Merialdo,et al.  Araneus in the Era of XML , 1999, IEEE Data Eng. Bull..

[86]  Ramanathan V. Guha,et al.  Cyc: toward programs with common sense , 1990, CACM.

[87]  Craig A. Knoblock,et al.  Query reformulation for dynamic information integration , 1996, Journal of Intelligent Information Systems.

[88]  Graeme Hirst,et al.  Context as a Spurious Concept , 1997, ArXiv.

[89]  Jungyun Seo,et al.  Classifying schematic and data heterogeneity in multidatabase systems , 1991, Computer.

[90]  Antonis C. Kakas,et al.  ACLP: Abductive Constraint Logic Programming , 2000, J. Log. Program..

[91]  Douglas B. Lenat,et al.  Mapping Ontologies into Cyc , 2002 .

[92]  Christoph W. Ueberhuber,et al.  Computational Integration , 2018, An Introduction to Scientific, Symbolic, and Graphical Computation.

[93]  Stuart E. Madnick,et al.  Knowledge Integration to Overcome Ontological Heterogeneity: Challenges from Financial Information Systems , 2002, ICIS.

[94]  Serge Abiteboul,et al.  Querying Semi-Structured Data , 1997, Encyclopedia of Database Systems.

[95]  Abraham Silberschatz,et al.  Database systems—breaking out of the box , 1997, SGMD.

[96]  Gio Wiederhold,et al.  The roles of artificial intelligence in information systems , 1991, Journal of Intelligent Information Systems.

[97]  Georg Gottlob,et al.  Declarative Information Extraction, Web Crawling, and Recursive Wrapping with Lixto , 2001, LPNMR.

[98]  Mark A. Musen,et al.  PROMPT: Algorithm and Tool for Automated Ontology Merging and Alignment , 2000, AAAI/IAAI.

[99]  John Wylie Lloyd,et al.  Foundations of Logic Programming , 1987, Symbolic Computation.

[100]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[101]  Arnon Rosenthal,et al.  Using semantic values to facilitate interoperability among heterogeneous information systems , 1994, TODS.

[102]  Harold Boley,et al.  The Rule Markup Language: RDF-XML Data Model, XML Schema Hierarchy, and XSL Transformations , 2001, INAP.

[103]  Benjamin N. Grosof,et al.  Representing E-Business Rules for the Semantic Web: Situated Courteous Logic Programs in RuleML , 2001 .

[104]  R. Guha Contexts: a formalization and some applications , 1992 .

[105]  Stuart E. Madnick,et al.  Seizing the Opportunity: Exploiting Web Aggregation , 2001, MIS Q. Executive.

[106]  Carles Sierra,et al.  Agent-Mediated Electronic Commerce , 2004, Autonomous Agents and Multi-Agent Systems.

[107]  Steffen Staab,et al.  OIL: The Ontology Inference Layer , 2000 .

[108]  Chitta Baral,et al.  Logic Programming and Knowledge Representation , 1994, J. Log. Program..

[109]  John McCarthy,et al.  Notes on Formalizing Context , 1993, IJCAI.