Extracting information from heterogeneous information sources using ontologically specified target views

Being deluged by exploding volumes of structured and unstructured data contained in databases, data warehouses, and the global Internet, people have an increasing need for critical information that is expertly extracted and integrated in personalized views. Allowing for the collective efforts of many data and knowledge workers, we offer in this paper a framework for addressing the issues involved. In our proposed framework we assume that a target view is specified ontologically and independently of any of the sources, and we model both the target and all the sources in the same modeling language. Then, for a given target and source we generate a target-to-source mapping, that has the necessary properties to enable us to load target facts from source facts. The mapping generator raises specific issues for a user's consideration, but is endowed with defaults to allow it to run to completion with or without user input. The framework is based on a formal foundation, and we are able to prove that when a source has a valid interpretation, the generated mapping produces a valid interpretation for the part of the target loaded from the source.

[1]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[2]  Stephen Soderland,et al.  Learning to Extract Text-Based Information from the World Wide Web , 1997, KDD.

[3]  Michael R. Genesereth,et al.  Infomaster - An Information Integration Tool , 1997 .

[4]  Michael R. Genesereth,et al.  Infomaster: an information integration system , 1997, SIGMOD '97.

[5]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[6]  Avigdor Gal Semantic interoperability in information services: experiencing with CoopWARE , 1999, SGMD.

[7]  M. Bunge,et al.  Treatise on Basic Philosophy. Vol. 4. , 1981 .

[8]  Nick Roussopoulos,et al.  Interoperability of multiple autonomous databases , 1990, CSUR.

[9]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[10]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[11]  Vipul Kashyap,et al.  Semantic and schematic similarities between database objects: a context-based approach , 1996, The VLDB Journal.

[12]  Luigi Palopoli,et al.  Automatic Derivation of Terminological Properties from Database Schemes , 1998, DEXA.

[13]  Craig A. Knoblock,et al.  Retrieving and Integrating Data from Multiple Information Sources , 1993, Int. J. Cooperative Inf. Syst..

[14]  Hector Garcia-Molina,et al.  Extracting Semistructured Information from the Web. , 1997 .

[15]  Fèlix Saltor,et al.  A structure based schema integration methodology , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[16]  Carlo Batini,et al.  Inclusion and Equivalence between Relational Database Schemata , 1982, Theor. Comput. Sci..

[17]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[18]  David W. Embley,et al.  Ontology-based extraction and structuring of information from data-rich unstructured documents , 1998, CIKM '98.

[19]  Calton Pu,et al.  Guest Editors' Introduction to the Special Issue on Heterogeneous Databases , 1990, ACM Computing Surveys.

[20]  David W. Embley,et al.  A Conceptual-Modeling Approach to Extracting Data from the Web , 1998, ER.

[21]  Dennis Shasha,et al.  An extensible Framework for Data Cleaning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[22]  Oren Etzioni,et al.  A scalable comparison-shopping agent for the World-Wide Web , 1997, AGENTS '97.

[23]  Joachim Biskup,et al.  Achievements of Relational Database Schema Design Theory Revisited , 1995, Semantics in Databases.

[24]  Stefano Spaccapietra,et al.  View Integration: A Step Forward in Solving Structural Conflicts , 1994, IEEE Trans. Knowl. Data Eng..

[25]  Catherine Baudin,et al.  Toward Structured Retrieval in Semi-structured Information Spaces , 1997, IJCAI.

[26]  Craig A. Knoblock,et al.  Modeling Web Sources for Information Integration , 1998, AAAI/IAAI.

[27]  Divesh Srivastava,et al.  The Information Manifold , 1995 .

[28]  David W. Embley,et al.  Record-boundary discovery in Web documents , 1999, SIGMOD '99.

[29]  Stuart E. Madnick,et al.  A Metadata Approach to Resolving Semantic Conflicts , 2011, VLDB.

[30]  Richard Hull,et al.  Relative information capacity of simple relational database schemata , 1984, SIAM J. Comput..

[31]  Alberto O. Mendelzon,et al.  Merging Databases Under Constraints , 1998, Int. J. Cooperative Inf. Syst..

[32]  Marian H. Nodine,et al.  Agent-based semantic interoperability in infosleuth , 1999, SGMD.

[33]  Leo Obrst,et al.  Unpacking the semantics of source and usage to perform semantic reconciliation in large-scale information systems , 1999, SGMD.

[34]  Fèlix Saltor,et al.  Discovering interdatabase resemblance of classes for interoperable databases , 1993, Proceedings RIDE-IMS `93: Third International Workshop on Research Issues in Data Engineering: Interoperability in Multidatabase Systems.

[35]  David W. Embley,et al.  Object-oriented systems analysis - a model-driven approach , 1991, Yourdon Press Computing series.

[36]  M. Bunge Treatise on basic philosophy , 1974 .

[37]  Frank Neven,et al.  Proceedings of the 20th International Workshop on the Web and Databases , 2005 .

[38]  Arnon Rosenthal,et al.  Using semantic values to facilitate interoperability among heterogeneous information systems , 1994, TODS.

[39]  Tom Atwood,et al.  Object Database Standard: ODMG-93, Release 1.2 , 1995 .

[40]  Amit P. Sheth,et al.  Semantic Interoperability in Global Information Systems: A Brief Introduction to the Research Area a , 1999 .

[41]  David W. Embley,et al.  Relational database reverse engineering: a model-centric, transformational, interactive approach formalized in model theory , 1997, Database and Expert Systems Applications. 8th International Conference, DEXA '97. Proceedings.

[42]  Amit P. Sheth,et al.  Semantic interoperability in global information systems , 1999, SGMD.

[43]  Aris M. Ouksel,et al.  Ontologies are not the Panacea in Data Integration: A Flexible Coordinator to Mediate Context Construction , 2004, Distributed and Parallel Databases.

[44]  Craig A. Knoblock,et al.  STALKER: Learning Extraction Rules for Semistructured, Web-based Information Sources * , 1998 .

[45]  Valeria De Antonellis,et al.  Relational Database Theory , 1993 .

[46]  Editors , 1986, Brain Research Bulletin.

[47]  Malú Castellanos,et al.  A Methodology for Semantically Enriching Interoperable Databases , 1993, BNCOD.

[48]  Craig A. Knoblock,et al.  Semi-automatic wrapper generation for Internet information sources , 1997, Proceedings of CoopIS 97: 2nd IFCIS Conference on Cooperative Information Systems.

[49]  Dan Smith,et al.  Information extraction for semi-structured documents , 1997 .

[50]  Isabelle Comyn-Wattiau,et al.  View Integration by Semantic Unification and Transformation of Data Structures , 1990, ER.

[51]  Diego Calvanese,et al.  Source integration in data warehousing , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[52]  E. V. Ravve,et al.  Dependency Preserving Refinements and the Fundamental Problem of Database Design , 1998, Data Knowl. Eng..

[53]  Silvana Castano,et al.  Semantic dictionary design for database interoperability , 1997, Proceedings 13th International Conference on Data Engineering.

[54]  David W. Embley Object database development - concepts and principles , 1997 .

[55]  Luigi Palopoli,et al.  An automatic technique for detecting type conflicts in database schemes , 1998, CIKM '98.

[56]  Aris M. Ouksel,et al.  Coordinating context building in heterogeneous information systems , 1994, Journal of Intelligent Information Systems.

[57]  Berthier A. Ribeiro-Neto,et al.  An Example-Based Environment for Wrapper Generation , 2000, ER.

[58]  David W. Embley,et al.  Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages , 1999, Data Knowl. Eng..

[59]  Maurizio Lenzerini,et al.  A Methodology for Data Schema Integration in the Entity Relationship Model , 1984, IEEE Transactions on Software Engineering.

[60]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[61]  R. G. G. Cattell,et al.  The Object Database Standard: ODMG-93 , 1993 .

[62]  Diego Calvanese,et al.  A Principled Approach to Data Integration and Reconciliation in Data Warehousing , 1999, DMDW.

[63]  Lois M. L. Delcambre,et al.  Structured Maps: modeling explicit semantics over a universe of information , 1996, International Journal on Digital Libraries.

[64]  Silvana Castano,et al.  Semantic integration of semistructured and structured data sources , 1999, SGMD.

[65]  James A. Larson,et al.  A Theory of Attribute Equivalence in Databases with Application to Schema Integration , 1989, IEEE Trans. Software Eng..