The web has gained much attention as new media reflecting real-time interest in the world. This attention is driven by the proliferation of tools like bulletin boards and weblogs. The web is a source from which we can collect and summarize information about a particular object (e.g., business organization, product, person, etc.) For example, the extraction of reputation information is a major research topic in information extraction and knowledge extraction from the web. The ability to collect web pages about a particular object is essential in obtaining such information and extracting knowledge from it. A big problem in the web page collection process is that the same objects are referred to in different ways in different web documents. For example, a person may be referred to by full name, first name, affiliation and title, or nicknames. This paper proposes a method for extracting these mnemonic names of people from the web and shows experimental results using real web data.
[1]
Craig A. Knoblock,et al.
Learning domain-independent string transformation weights for high accuracy object identification
,
2002,
KDD.
[2]
Tommi S. Jaakkola,et al.
Using term informativeness for named entity detection
,
2005,
SIGIR '05.
[3]
Jayant Madhavan,et al.
Reference reconciliation in complex information spaces
,
2005,
SIGMOD '05.
[4]
Anuradha Bhamidipaty,et al.
Interactive deduplication using active learning
,
2002,
KDD.
[5]
Ralph Grishman,et al.
Message Understanding Conference- 6: A Brief History
,
1996,
COLING.
[6]
David M. Pennock,et al.
Mining the peanut gallery: opinion extraction and semantic classification of product reviews
,
2003,
WWW '03.
[7]
Salvatore J. Stolfo,et al.
The merge/purge problem for large databases
,
1995,
SIGMOD '95.
[8]
Yoram Singer,et al.
Unsupervised Models for Named Entity Classification
,
1999,
EMNLP.