Hierarchical Language Models for Expert Finding in Enterprise Corpora

Enterprise corpora contain evidence of what employees work on and therefore can be used to automatically find experts on a given topic. We present a general approach for representing the knowledge of a potential expert as a mixture of language models from associated documents. First we retrieve documents given the expert's name using a generative probabilistic technique and weight the retrieved documents according to expert-specific posterior distribution. Then we model the expert indirectly through the set of associated documents, which allows us to exploit their underlying structure and complex language features. Experiments show that our method has excellent performance on TREC 2005 expert search task and that it effectively collects and combines evidence for expertise in a heterogeneous collection

[1]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.

[2]  Mark T. Maybury,et al.  Enterprise expert and knowledge discovery , 1999, HCI.

[3]  Richard M. Crowder,et al.  Expert Finding by Capturing Organisational Knowledge from Legacy Documents , 2006 .

[4]  W. Bruce Croft,et al.  Indri : A language-model based search engine for complex queries ( extended version ) , 2005 .

[5]  David Hawking,et al.  Panoptic Expert: Searching for experts not just for documents , 2001 .

[6]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[7]  Shenghua Bao,et al.  Research on Expert Search at Enterprise Track of TREC 2006 , 2005, TREC.

[8]  James Allan,et al.  An Exploration of Entity Models, Collective Classification and Relation Description , 2004 .

[9]  Jianchang Mao,et al.  Enterprise Search: Tough Stuff , 2004, ACM Queue.

[10]  Maarten de Rijke,et al.  Finding experts and their eetails in e-mail corpora , 2006, WWW '06.

[11]  Alfred Kobsa,et al.  DEMOIR: a hybrid architecture for expertise modeling and recommender systems , 2000, Proceedings IEEE 9th International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WET ICE 2000).

[12]  Nick Craswell,et al.  Overview of the TREC 2005 Enterprise Track , 2005, TREC.

[13]  Bart Selman,et al.  Agent Amplified Communication , 1996, AAAI/IAAI, Vol. 1.

[14]  Jimmy Carter,et al.  An Exploration of Entity Models , Collective Classification and Relation Description , 2004 .

[15]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[16]  Yiqun Liu,et al.  THUIR at TREC 2005: Enterprise Track , 2005, TREC.