EntityBases: Compiling, Organizing and Querying Massive Entity Repositories

The current approaches for linking information across sources, often called record linkage, require finding common attributes between the sources and comparing the records using those attributes. This often leads to unsatisfactory results because the sources are often missing information or contain incorrect or outdated information. We are addressing this problem by developing the technology to build massive entity knowledgebases, which we call EntityBases. The key idea is to create a comprehensive knowledgebase for the entities of interest (e.g., companies). In order to build such a knowledge base, we must address the issues of linking entities with multi-valued attributes obtained from heterogeneous sources and providing a virtual repository that can be efficiently queried. This paper describes how we have addressed these issues and shows how an EntityBaseTM can be used for understanding and linking text documents.

[1]  Craig A. Knoblock,et al.  Learning Blocking Schemes for Record Linkage , 2006, AAAI.

[2]  Craig A. Knoblock,et al.  Composing, optimizing, and executing plans for bioinformatics web services , 2005, The VLDB Journal.

[3]  Craig A. Knoblock,et al.  Conditional constraint networks for interleaved planning and information gathering , 2005, IEEE Intelligent Systems.

[4]  Michael R. Genesereth,et al.  Query planning and optimization in information integration , 1997 .

[5]  Craig A. Knoblock,et al.  Learning Approximate Thematic Maps from Labeled Geospatial Data , 2004 .

[6]  Craig A. Knoblock,et al.  A heterogeneous field matching method for record linkage , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[7]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[8]  Craig A. Knoblock,et al.  Compiling Source Descriptions for Efficient and Flexible Information Integration , 2001, Journal of Intelligent Information Systems.

[9]  William E. Winkler,et al.  Advanced Methods For Record Linkage , 1994 .

[10]  Craig A. Knoblock,et al.  Exploiting online sources to accurately geocode addresses , 2004, GIS '04.

[11]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[12]  Craig A. Knoblock,et al.  Learning object identification rules for information integration , 2001, Inf. Syst..

[13]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[14]  J W Hogan,et al.  On the wrong side of the tracts? Evaluating the accuracy of geocoding in public health research. , 2001, American journal of public health.

[15]  Peter Christen,et al.  A Comparison of Fast Blocking Methods for Record Linkage , 2003, KDD 2003.

[16]  S. Bruin Predicting the areal extent of land-cover types using classified imagery and geostatistics. , 2000 .