SocialSearch: enhancing entity search with social network matching

This paper introduces the problem of matching people names to their corresponding social network identities such as their Twitter accounts. Existing tools for this purpose build upon naive textual matching and inevitably suffer low precision, due to false positives (e.g., fake impersonator accounts) and false negatives (e.g., accounts using nicknames). To overcome these limitations, we leverage "relational" evidences extracted from the Web corpus. In particular, as such an example, weadopt Web document co-occurrences, which can be interpreted as an "implicit" counterpart of Twitter follower relationships. Using both textual and relational features, we learn a ranking function aggregating these features for the accurate ordering of candidate matches. Another key contribution of this paper is to formulate confidence scoring as a separate problem from relevance ranking. A baseline approach is to use the relevance of the top match itself as the confidence score. In contrast, we train a separate classifier, using not only the top relevance score but also various statistical features extracted from the relevance scores of all candidates, and empirically validate to outperform the baseline approach. We evaluate our proposed system using real-life internetscale entity-relationship and social network graphs.

[1]  Andrew McCallum,et al.  Disambiguating Web appearances of people in a social network , 2005, WWW '05.

[2]  Seung-won Hwang,et al.  Query result clustering for object-level search , 2009, KDD.

[3]  Gerhard Weikum,et al.  Gathering and ranking photos of named entities with high precision, high recall, and diversity , 2010, WSDM '10.

[4]  M VoorheesEllen The TREC question answering track , 2001 .

[5]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[6]  Michael J. Muller,et al.  Make new friends, but keep the old: recommending people on social networking sites , 2009, CHI.

[7]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[8]  Foster J. Provost,et al.  The myth of the double-blind review?: author identification using only citations , 2003, SKDD.

[9]  Wei-Ying Ma,et al.  Object-level Vertical Search , 2007, CIDR.

[10]  Vitaly Shmatikov,et al.  De-anonymizing Social Networks , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[11]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[12]  Ido Guy,et al.  Personalized recommendation of social software items based on social relations , 2009, RecSys '09.

[13]  Byung-Won On,et al.  Comparative study of name disambiguation problem using a scalable blocking-based framework , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[14]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.