An Approach for Incremental Entity Resolution at the Example of Social Media Data

When querying data providers on the web, one has no guarantee that they will reply within a given time. Some providers may even not answer at all. This makes it infeasible to wait for a complete result before beginning with the entity resolution. In order to solve this problem, we present a query-time entity resolution approach that takes the asynchronous nature of the replies from data providers into account by starting the entity resolution as soon as first results are returned. Resolved entities are propagated from the entity resolution engine to the mobile client as early as possible. Resolution results that are produced later are send as updates to the client and thus improve earlier results.

[1]  R. Sinnott Virtues of the Haversine , 1984 .

[2]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[3]  William W. Cohen Data integration using similarity joins and a word-based information representation language , 2000, TOIS.

[4]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.

[5]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[6]  Ansgar Scherp,et al.  A comparative user study of faceted search in large data hierarchies on mobile devices , 2013, MUM.

[7]  R. Sinnott Computing Under the Open Sky , 1984 .

[8]  Craig A. Knoblock,et al.  Retrieving and semantically integrating heterogeneous data from the Web , 2004, IEEE Intelligent Systems.

[9]  Lise Getoor,et al.  GeoDDupe: A Novel Interface for Interactive Entity Resolution in Geospatial Data , 2007, 2007 11th International Conference Information Visualization (IV '07).

[10]  Timo Sztyler,et al.  An Incremental Approach to Entity Resolution , 2013 .

[11]  Jennifer Widom,et al.  Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.

[12]  Roger W. Sinnott,et al.  Astronomical computing: 1. Computing under the open sky. 2. Virtues of the haversine. , 1984 .

[13]  Craig A. Knoblock,et al.  A Graph-Based Approach to Learn Semantic Descriptions of Data Sources , 2013, SEMWEB.

[14]  Lise Getoor,et al.  Query-time entity resolution , 2006, KDD '06.

[15]  Rajeev Motwani,et al.  Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.