GRIAS: An Entity-Relation Graph Based Framework for Discovering Entity Aliases

Recognizing the various aliases of an entity is a critical task for many applications, including Web search, recommendation system, and e-discovery. The goal of this paper is to accurately identify entity aliases, especially the long tail ones in the unstructured data. Our solution GRIAS (abbr. for a Graph-based framework for discovering entity Aliases) is motivated by the entity relationships collected from both the structured and unstructured data. These relationships help to build an entity-relation graph, and the graph-based similarity is calculated between an entity and its alias candidates which are first chosen by our proposed candidate selection method. Extensive experimental results on two real-world datasets demonstrate both the effectiveness and efficiency of the proposed framework.

[1]  Christopher Leckie,et al.  Tensor Space Learning for Analyzing Activity Patterns from Video Sequences , 2007 .

[2]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[3]  Bradley Malin,et al.  Email alias detection using social network analysis , 2005, LinkKDD '05.

[4]  Dmitri V. Kalashnikov,et al.  A probabilistic model for entity disambiguation using relationships , 2004 .

[5]  Lluís Padró,et al.  Alias Assignment in Information Extraction , 2007, Proces. del Leng. Natural.

[6]  Diane J. Cook,et al.  Knowledge Discovery in Entity Based Smart Environment Resident Data Using Temporal Relation Based Data Mining , 2007 .

[7]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[8]  Yong Shi,et al.  Entity Resolution with Attribute and Connection Graph , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[9]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[10]  Andy Seaborne,et al.  Using Hybrid Search and Query for E-discovery Identification , 2009, International Semantic Web Conference.

[11]  David Guy Brizan,et al.  A. Survey of Entity Resolution and Record Linkage Methodologies , 2015, Communications of the IIMA.

[12]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[13]  Jennifer Widom,et al.  Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.

[14]  William E. Winkler,et al.  String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. , 1990 .

[15]  Hongyan Liu,et al.  Fast Single-Pair SimRank Computation , 2010, SDM.

[16]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[17]  Ioannis Antonellis,et al.  Simrank++: query rewriting through link analysis of the clickgraph (poster) , 2007, Proc. VLDB Endow..

[18]  J. Silva,et al.  A Local Maxima method and a Fair Dispersion Normalization for extracting multi-word units from corpora , 2009 .

[19]  Surajit Chaudhuri,et al.  Targeted disambiguation of ad-hoc, homogeneous sets of named entities , 2012, WWW.

[20]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[21]  Pedro M. Domingos,et al.  Entity Resolution with Markov Logic , 2006, Sixth International Conference on Data Mining (ICDM'06).

[22]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[23]  Surajit Chaudhuri,et al.  A framework for robust discovery of entity synonyms , 2012, KDD.

[24]  Danushka Bollegala,et al.  Identification of Personal Name Aliases on the Web , 2008 .

[25]  Manuel Zahariev,et al.  A (acronyms) , 2004 .

[26]  Surajit Chaudhuri,et al.  Exploiting web search to generate synonyms for entities , 2009, WWW '09.

[27]  Wei Liu,et al.  Acronym extraction and disambiguation in large-scale organizational web pages , 2009, CIKM.

[28]  Surajit Chaudhuri,et al.  Mining Document Collections to Facilitate Accurate Approximate Entity Matching , 2009, Proc. VLDB Endow..

[29]  Danushka Bollegala,et al.  Automatically Extracting Personal Name Aliases from the Web , 2008, GoTAL.