Identification of Personal Name Aliases on the Web

Extracting aliases of an entity is important for various tasks such as identiflcation of relations among entities, web search and entity disambiguation. To extract relations among entities properly, one must flrst identify those entities. We propose a novel approach to flnd aliases of a given name using automatically extracted lexical patterns. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to flnd candidate aliases of a given name. We use anchor texts to design a word cooccurrence model and use it to deflne various ranking scores to measure the association between a name and a candidate alias. The ranking scores are integrated with page-countbased association measures using support vector machines to leverage a robust alias detection method. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically signiflcant mean reciprocal rank of 0:6718. Experiments carried out using a dataset of location names and Japanese personal names suggest the possibility of extending the proposed method to extract aliases for difierent types of named entities and for other languages. Moreover, the aliases extracted using the proposed method improve recall by 20% in a relation-detection task.

[1]  Julio Gonzalo,et al.  A testbed for people searching strategies in the WWW , 2005, SIGIR '05.

[2]  Hiroyuki Kitagawa,et al.  Extracting Mnemonic Names of People from the Web , 2006, ICADL.

[3]  Razvan C. Bunescu,et al.  Learning to Extract Relations from the Web using Minimal Supervision , 2007, ACL.

[4]  Eduard Hovy,et al.  Multi-Document Person Name Resolution , 2004 .

[5]  Mitsuru Ishizuka,et al.  Extracting Keyphrases to Represent Relations in Social Networks from Web , 2007, IJCAI.

[6]  Amit P. Sheth,et al.  SemRank: ranking complex relationship search results on the semantic web , 2005, WWW '05.

[7]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[8]  Alan W. Biermann,et al.  A Methodology for Cross-document Coreference Cross-document Coreference: the Problem Architecture and the Methodology , 2000 .

[9]  M. Adams,et al.  Approximate Personal Name-Matching Through Finite-State Graphs , 2022 .

[10]  Danushka Bollegala,et al.  Measuring semantic similarity between words using web search engines , 2007, WWW '07.

[11]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[12]  James Allan,et al.  Cross-Document Coreference on a Large Scale Corpus , 2004, NAACL.

[13]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[14]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[15]  Peter Mika Ontologies Are Us: A Unified Model of Social Networks and Semantics , 2005, International Semantic Web Conference.

[16]  Kôiti Hasida,et al.  POLYPHONET: an advanced social network extraction system from the web , 2006, WWW '06.

[17]  Steffen Staab,et al.  Towards the self-annotating web , 2004, WWW '04.

[18]  Arthur Stutt,et al.  MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup , 2002, EKAW.

[19]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[20]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[21]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[22]  Ismailcem Budak Arpinar,et al.  Ontology-Driven Automatic Entity Disambiguation in Unstructured Text , 2006, SEMWEB.

[23]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[24]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[25]  Steffen Staab,et al.  CREAM: CREAting Metadata for the Semantic Web , 2003, Comput. Networks.

[26]  Andrew McCallum,et al.  Disambiguating Web appearances of people in a social network , 2005, WWW '05.

[27]  Ted Pedersen,et al.  An Unsupervised Language Independent Method of Name Discrimination Using Second Order Co-occurrence Features , 2006, CICLing.