On-the-fly entity resolution from distributed social media sources for mobile search and exploration

We present an approach and mobile application for the interactive exploration and search of geo-located social media entities from different, distributed data providers on the web. When querying the providers, the returned results typically have some overlap. In addition, one has no guarantee that the providers reply within a given time interval. Thus, in order to provide users with geo-located entities in their vicinity in a timely manner, we need to take the asynchronous nature of the data providers' replies into account. Our novel on-the-fly entity resolution engine starts the entity resolution once it retrieves the first responses. It incrementally extends the entity resolution model when more responses arrive. Entities are propagated to the client once the resolution engine has processed them for the first time. Resolution results produced at a later point in time are sent as updates to the client and improve earlier, incomplete results. Our experiments show a matching precision of 95% and scalability of the on-the-fly entity resolution w.r.t. the number of resources being simultaneously processed.

[1]  Craig A. Knoblock,et al.  Retrieving and semantically integrating heterogeneous data from the Web , 2004, IEEE Intelligent Systems.

[2]  Lise Getoor,et al.  GeoDDupe: A Novel Interface for Interactive Entity Resolution in Geospatial Data , 2007, 2007 11th International Conference Information Visualization (IV '07).

[3]  Dan Roth,et al.  Identification and Tracing of Ambiguous Names: Discriminative and Generative Approaches , 2004, AAAI.

[4]  Divesh Srivastava,et al.  Incremental Record Linkage , 2014, Proc. VLDB Endow..

[5]  Lise Getoor,et al.  Query-time entity resolution , 2006, KDD '06.

[6]  Marian Olteanu,et al.  Deduplicating a places database , 2014, WWW.

[7]  Peter Christen,et al.  Unsupervised Blocking Key Selection for Real-Time Entity Resolution , 2015, PAKDD.

[8]  Ben Shneiderman,et al.  Interactive Entity Resolution in Relational Data: A Visual Analytic Tool and Its Evaluation , 2008, IEEE Transactions on Visualization and Computer Graphics.

[9]  Charles Elkan,et al.  The Field Matching Problem: Algorithms and Applications , 1996, KDD.

[10]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.

[11]  Hector Garcia-Molina,et al.  Incremental entity resolution on rules and data , 2014, The VLDB Journal.

[12]  Jennifer Widom,et al.  Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.

[13]  Ansgar Scherp,et al.  A comparative user study of faceted search in large data hierarchies on mobile devices , 2013, MUM.

[14]  Roger W. Sinnott,et al.  Astronomical computing: 1. Computing under the open sky. 2. Virtues of the haversine. , 1984 .

[15]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[16]  William W. Cohen Data integration using similarity joins and a word-based information representation language , 2000, TOIS.

[17]  Charles Elkan,et al.  An Efficient Domain-Independent Algorithm for Detecting Approximately Duplicate Database Records , 1997, DMKD.

[18]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[19]  Yizhou Sun,et al.  Entity Matching across Heterogeneous Sources , 2015, KDD.

[20]  Aamod Sane,et al.  Fast and accurate incremental entity resolution relative to an entity knowledge base , 2012, CIKM '12.

[21]  Ben Shneiderman,et al.  D-Dupe: An Interactive Tool for Entity Resolution in Social Networks , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[22]  Stuart J. Russell,et al.  Identity Uncertainty and Citation Matching , 2002, NIPS.

[23]  Steffen Staab,et al.  Interactive faceted search and exploration of open social media data on a touchscreen mobile phone , 2013, Multimedia Tools and Applications.

[24]  Craig A. Knoblock,et al.  A Graph-Based Approach to Learn Semantic Descriptions of Data Sources , 2013, SEMWEB.

[25]  Timo Sztyler,et al.  A field study on the usability of a nearby search app for finding and exploring places and events , 2014, MUM.

[26]  Peter Christen,et al.  Forest-Based Dynamic Sorted Neighborhood Indexing for Real-Time Entity Resolution , 2014, CIKM.

[27]  Gautam Shroff,et al.  Incremental entity fusion from linked documents , 2014, 17th International Conference on Information Fusion (FUSION).

[28]  Rajeev Motwani,et al.  Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.