The WHIRL Approach to Integration: An Overview

We describe a new integration system, in which information sources are converted into a highly structured collection of small fragments of text. Database-like queries to this structured collection of text fragments are approximated using a novel logic called WHIRL, which combines inference in the style of deductive databases with ranked retrieval methods from information retrieval. WHIRL allows queries that integrate information from information sources, without requiring the extraction and normalization of object identifiers that can be used as keys; instead, operations that in conventional databases require equality tests on keys are approximated using IR similarity metrics for text. This leads to a reduction in the amount of human engineering required to field an integration system.

[1]  L. Goddard First Course , 1969, Nature.

[2]  Craig A. Knoblock,et al.  Query processing in the SIMS information mediator , 1997 .

[3]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[4]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[5]  Gerald Salton,et al.  Automatic text processing , 1988 .

[6]  Jennifer Widom,et al.  A First Course in Database Systems , 1997 .

[7]  Moshe Tennenholtz,et al.  Next Generation Information Technologies and Systems (NGITS '95), Second International Workshop, Naharia, Israel, June 27-29, 1995 , 1995, NGITS.

[8]  Howard R. Turtle,et al.  Query Evaluation: Strategies and Optimizations , 1995, Inf. Process. Manag..

[9]  William W. Cohen Knowledge integration for structured information sources containing text (extended abstract) , 1997 .

[10]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[11]  Joann J. Ordille,et al.  Querying Heterogeneous Information Sources Using Source Descriptions , 1996, VLDB.

[12]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[13]  Paolo Merialdo,et al.  Semistructured and structured data in the Web: going back and forth , 1997, SGMD.

[14]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[15]  William W. Cohen A Web-based information system that reasons with structured collections of text , 1998, AGENTS '98.

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .