Modeling Documents as Mixtures of Persons for Expert Finding

In this paper we address the problem of searching for knowledgeable persons within the enterprise, known as the expert finding (or expert search) task. We present a probabilistic algorithm using the assumption that terms in documents are produced by people who are mentioned in them.We represent documents retrieved to a query as mixtures of candidate experts language models. Two methods of personal language models extraction are proposed, as well as the way of combining them with other evidences of expertise. Experiments conducted with the TREC Enterprise collection demonstrate the superiority of our approach in comparison with the best one among existing solutions.

[1]  Andrew Trotman,et al.  Comparative Evaluation of XML Information Retrieval Systems: 5th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2006 Dagstuhl Castle, Germany, December 17-20, 2006 Revised and Selected Papers , 2005 .

[2]  Paul P. Maglio,et al.  Expertise identification using email communications , 2003, CIKM '03.

[3]  Djoerd Hiemstra,et al.  Structured Document Retrieval, Multimedia Retrieval, and Entity Ranking Using PF/Tijah , 2008, INEX.

[4]  Fabio Crestani,et al.  “Is this document relevant?…probably”: a survey of probabilistic models in information retrieval , 1998, CSUR.

[5]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[6]  W. Bruce Croft,et al.  Hierarchical Language Models for Expert Finding in Enterprise Corpora , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[7]  W. Bruce Croft,et al.  Finding experts in community-based question-answering services , 2005, CIKM '05.

[8]  David Hawking,et al.  Panoptic Expert: Searching for experts not just for documents , 2001 .

[9]  Mark S. Ackerman,et al.  Just talk to me: a field study of expertise location , 1998, CSCW '98.

[10]  Craig MacDonald,et al.  Using Relevance Feedback in Expert Search , 2007, ECIR.

[11]  Djoerd Hiemstra,et al.  University of Twente at the TREC 2007 Enterprise Track: Modeling Relevance Propagation for the Expert Search Task , 2007, TREC.

[12]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[13]  John D. Lafferty,et al.  Model-based feedback in the language modeling approach to information retrieval , 2001, CIKM '01.

[14]  M. de Rijke,et al.  Broad expertise retrieval in sparse data environments , 2007, SIGIR.

[15]  CrestaniFabio,et al.  Is this document relevant?probably , 1998 .

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  ChengXiang Zhai,et al.  Probabilistic Models for Expert Finding , 2007, ECIR.

[18]  Maarten de Rijke,et al.  Finding experts and their eetails in e-mail corpora , 2006, WWW '06.

[19]  Stephen E. Robertson,et al.  Window-based Enterprise Expert Search , 2006, TREC.

[20]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[21]  Giuseppe Attardi,et al.  Ranking very many typed entities on wikipedia , 2007, CIKM '07.

[22]  D. Hiemstra,et al.  Statistical Language Models and Information Retrieval: Natural Language Processing Really Meets Retrieval , 2001 .

[23]  Mark T. Maybury,et al.  Expert Finding Systems , 2006 .

[24]  Haiqiang Chen,et al.  Social Network Structure Behind the Mailing Lists: ICT-IIIS at TREC 2006 Expert Finding Track , 2006, TREC.

[25]  Craig MacDonald,et al.  Voting for candidates: adapting data fusion techniques for an expert search task , 2006, CIKM '06.

[26]  Nick Craswell,et al.  Overview of the TREC 2005 Enterprise Track , 2005, TREC.

[27]  Wolfgang Nejdl,et al.  Enhancing Expert Search Through Query Modeling , 2007, ECIR.