Broad expertise retrieval in sparse data environments

Expertise retrieval has been largely unexplored on data other than the W3C collection. At the same time, many intranets of universities and other knowledge-intensive organisations offer examples of relatively small but clean multilingual expertise data, covering broad ranges of expertise areas. We first present two main expertise retrieval tasks, along with a set of baseline approaches based on generative language modeling, aimed at finding expertise relations between topics and people. For our experimental evaluation, we introduce (and release) a new test set based on a crawl of a university site. Using this test set, we conduct two series of experiments. The first is aimed at determining the effectiveness of baseline expertise retrieval methods applied to the new test set. The second is aimed at assessing refined models that exploit characteristic features of the new test set, such as the organizational structure of the university, and the hierarchical structure of the topics in the test set. Expertise retrieval models are shown to be robust with respect to environments smaller than the W3C collection, and current techniques appear to be generalizable to other settings.

[1]  M. de Rijke,et al.  Determining Expert Profiles (With an Application to Expert Finding) , 2007, IJCAI.

[2]  W. Bruce Croft,et al.  Cross-lingual relevance models , 2002, SIGIR '02.

[3]  Paul P. Maglio,et al.  Expertise identification using email communications , 2003, CIKM '03.

[4]  Thomas H. Davenport,et al.  Book review:Working knowledge: How organizations manage what they know. Thomas H. Davenport and Laurence Prusak. Harvard Business School Press, 1998. $29.95US. ISBN 0‐87584‐655‐6 , 1998 .

[5]  Jian-Yun Nie,et al.  Integrating word relationships into language models , 2005, SIGIR '05.

[6]  M. de Rijke,et al.  Finding similar experts , 2007, SIGIR.

[7]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[8]  Craig MacDonald,et al.  Voting for candidates: adapting data fusion techniques for an expert search task , 2006, CIKM '06.

[9]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[10]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[11]  Nick Craswell,et al.  Overview of the TREC 2006 Enterprise Track , 2006, TREC.

[12]  Nick Craswell,et al.  Overview of the TREC 2005 Enterprise Track , 2005, TREC.

[13]  Irma Becerra-Fernandez The role of artificial intelligence technologies in the implementation of People-Finder knowledge management systems , 2000, Knowl. Based Syst..

[14]  W. Bruce Croft,et al.  Hierarchical Language Models for Expert Finding in Enterprise Corpora , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[15]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[16]  Audris Mockus,et al.  Expertise Browser: a quantitative approach to identifying expertise , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[17]  David Hawking,et al.  Panoptic Expert: Searching for experts not just for documents , 2001 .

[18]  Elena Filatova,et al.  Tell Me What You Do and I'll Tell You What You Are: Learning Occupation-Related Activities for Biographies , 2005, HLT/EMNLP.

[19]  Tao Tao,et al.  Language Model Information Retrieval with Document Expansion , 2006, NAACL.