A Classification Framework for Disambiguating Web People Search Result Using Feedback

This paper is concerned with the problem of disambiguating Web people search result. Finding the information about people is one of the most common activities on the Web. However, the result of searching person names suffers a lot from the problem of ambiguity. In this paper, we propose a classification framework to solve this problem using an additional feedback page. Compared with the traditional solution which clusters the search result, our framework has lower computational complexity and better effect. we also developed two new features under the framework, which utilized the information beyond tokens. Experiments show that the performance can be improved greatly using the two features. Different classification methods are also compared for their effectiveness for the task.

[1]  Hui Han,et al.  Name disambiguation in author citations using a K-way spectral clustering method , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[2]  ChengXiang Zhai,et al.  A study of methods for negative relevance feedback , 2008, SIGIR '08.

[3]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[4]  Gerard Salton,et al.  Optimization of relevance feedback weights , 1995, SIGIR '95.

[5]  Julio Gonzalo,et al.  Web people search: results of the first evaluation and the plan for the second , 2008, WWW.

[6]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[7]  Byung-Won On,et al.  Comparative study of name disambiguation problem using a scalable blocking-based framework , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[8]  Ying Chen,et al.  Towards Robust Unsupervised Personal Name Disambiguation , 2007, EMNLP-CoNLL.

[9]  Julio Gonzalo,et al.  A testbed for people searching strategies in the WWW , 2005, SIGIR '05.

[10]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[11]  Jean-Raymond Abrial,et al.  On B , 1998, B.

[12]  Takio Kurita,et al.  An efficient agglomerative clustering algorithm using a heap , 1991, Pattern Recognit..

[13]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[14]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.