Using hierarchical document evaluation and weighted associations to find experts in enterprise corpora

An expert finding system enables individuals within a large organization to search for authoritative people in a certain area. Various approaches have been proposed to solve the problem of expert finding and most of them can be classified as either document model or candidate model. In this paper, we propose a novel strategy based on the document model to improve the performance of expert finding system. Since association building and document evaluation are of primary importance to document model, we intend to estimate the strength of association in a better manner than traditional methods by using the count of candidates' occurrences as a factor, improve the document evaluation by employing a hierarchical method which incorporates document structure. What's more, we employ two methods of query expansion based on pseudo relevance feedback and proximity constrains respectively to reformulate original topic to improve the accuracy of expert finding. Finally, we evaluate our methods on the Enterprise corpora provided by Text REtrieval Conference (TREC). Experimental results shows that our strategy brings substantial gains to the expert finding and delivers excellent performance.

[1]  Peter Bailey,et al.  The CSIRO enterprise search test collection , 2007, SIGF.

[2]  W. Bruce Croft,et al.  Relevance-Based Language Models , 2001, SIGIR '01.

[3]  David Hawking,et al.  Challenges in Enterprise Search , 2004, ADC.

[4]  Nick Craswell,et al.  Overview of the TREC 2005 Enterprise Track , 2005, TREC.

[5]  M. de Rijke,et al.  Combining Candidate and Document Models for Expert Search , 2008, TREC.

[6]  Mark T. Maybury,et al.  Enterprise expert and knowledge discovery , 1999, HCI.

[7]  Thomas H. Davenport,et al.  Book review:Working knowledge: How organizations manage what they know. Thomas H. Davenport and Laurence Prusak. Harvard Business School Press, 1998. $29.95US. ISBN 0‐87584‐655‐6 , 1998 .

[8]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[9]  Yong Yu,et al.  Research on Enterprise Track of TREC 2007 at SJTU APEX Lab , 2007, TREC.

[10]  Haiqiang Chen,et al.  Research on Enterprise Track of TREC 2008 , 2007, TREC.

[11]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[12]  Chirag Shah,et al.  Evaluating high accuracy retrieval techniques , 2004, SIGIR '04.

[13]  Nick Craswell,et al.  Overview of the TREC 2006 Enterprise Track , 2006, TREC.

[14]  ChengXiang Zhai,et al.  Active feedback in ad hoc information retrieval , 2005, SIGIR '05.

[15]  Peter Bailey,et al.  Overview of the TREC 2007 Enterprise Track , 2007, TREC.

[16]  Djoerd Hiemstra,et al.  Modeling multi-step relevance propagation for expert finding , 2008, CIKM '08.

[17]  Krisztian Balog,et al.  People search in the enterprise , 2007, SIGF.

[18]  David Hawking,et al.  Panoptic Expert: Searching for experts not just for documents , 2001 .

[19]  Anne-Marie Vercoustre,et al.  Enterprise PeopleFinder: Combining Evidences from Web Pages and Corporate Data , 2003 .

[20]  Koichi Takeda,et al.  Information retrieval on the web , 2000, CSUR.