Integrating multiple windows and document features for expert finding

Expert finding is a key task in enterprise search and has recently attracted lots of attention from both research and industry communities. Given a search topic, a prominent existing approach is to apply some information retrieval (IR) system to retrieve top ranking documents, which will then be used to derive associations between experts and the search topic based on cooccurrences. However, we argue that expert finding is more sensitive to multiple levels of associations and document features that current expert finding systems insufficiently address, including (a) multiple levels of associations between experts and search topics, (b) document internal structure, and (c) document authority. We propose a novel approach that integrates the above-mentioned three aspects as well as a query expansion technique in a two-stage model for expert finding. A systematic evaluation is conducted on TREC collections to test the performance of our approach as well as the effects of multiple windows, document features, and query expansion. These experimental results show that query expansion can dramatically improve expert finding performance with statistical significance. For three well-known IR models with or without query expansion, document internal structures help improve a single window-based approach but without statistical significance, while our novel multiple window-based approach can significantly improve the performance of a single window-based approach both with and without document internal structures. © 2009 Wiley Periodicals, Inc.

[1]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[2]  Stephen E. Robertson,et al.  Query Expansion with Long-Span Collocates , 2003, Information Retrieval.

[3]  Stephen E. Robertson,et al.  Relevance weighting for query independent evidence , 2005, SIGIR '05.

[4]  Enrico Motta,et al.  The Open University at TREC 2006 Enterprise Track Expert Search Task , 2006, TREC.

[5]  Peter Bruza,et al.  Towards context sensitive information inference , 2003, J. Assoc. Inf. Sci. Technol..

[6]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[7]  W. Bruce Croft,et al.  Formal multiple-bernoulli models for language modeling , 2004, SIGIR '04.

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Wei Lu,et al.  Using Document Weight Combining Method for Enterprise Expert Mining , 2007, 2007 International Conference on Wireless Communications, Networking and Mobile Computing.

[10]  M. de Rijke,et al.  Determining Expert Profiles (With an Application to Expert Finding) , 2007, IJCAI.

[11]  Paul P. Maglio,et al.  Expertise identification using email communications , 2003, CIKM '03.

[12]  Doug Downey,et al.  Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison , 2004, AAAI.

[13]  W. Bruce Croft,et al.  Proximity-based document representation for named entity retrieval , 2007, CIKM '07.

[14]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[15]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[16]  Bo Peng,et al.  CNDS Expert Finding System for TREC 2005 , 2005, TREC.

[17]  In-Ho Kang,et al.  Query type classification for web document retrieval , 2003, SIGIR.

[18]  Jack G. Conrad,et al.  A system for discovering relationships by feature extraction from text databases , 1994, SIGIR '94.

[19]  ChengXiang Zhai,et al.  Probabilistic Models for Expert Finding , 2007, ECIR.

[20]  Kevin Chen-Chuan Chang,et al.  EntityRank: Searching Entities Directly and Holistically , 2007, VLDB.

[21]  Nick Craswell,et al.  Overview of the TREC 2006 Enterprise Track , 2006, TREC.

[22]  M. de Rijke,et al.  Broad expertise retrieval in sparse data environments , 2007, SIGIR.

[23]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[24]  Alfred Kobsa,et al.  Expert-Finding Systems for Organizations: Problem and Domain Analysis and the DEMOIR Approach , 2003, J. Organ. Comput. Electron. Commer..

[25]  Stephen E. Robertson,et al.  On Term Selection for Query Expansion , 1991, J. Documentation.

[26]  Goran Nenadic,et al.  Mining semantically related terms from biomedical literature , 2006, TALIP.

[27]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[28]  Craig MacDonald,et al.  Voting for candidates: adapting data fusion techniques for an expert search task , 2006, CIKM '06.

[29]  Nick Craswell,et al.  Overview of the TREC 2005 Enterprise Track , 2005, TREC.

[30]  Edward A. Fox,et al.  Research Contributions , 2014 .

[31]  W. Bruce Croft,et al.  Hierarchical Language Models for Expert Finding in Enterprise Corpora , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[32]  Mark T. Maybury,et al.  Expert Finding for Collaborative Virtual Environments , 2001, CACM.

[33]  Piotr Indyk,et al.  New Algorithms for Subset Query, Partial Match, Orthogonal Range Searching, and Related Problems , 2002, ICALP.

[34]  Djoerd Hiemstra,et al.  Modeling Documents as Mixtures of Persons for Expert Finding , 2008, ECIR.

[35]  Fabio Ciravegna,et al.  Adaptive Information Extraction from Text by Rule Induction and Generalisation , 2001, IJCAI.

[36]  Craig MacDonald,et al.  Expertise drift and query expansion in expert search , 2007, CIKM '07.

[37]  Jianfeng Gao,et al.  A Supervised Learning Approach to Entity Search , 2006, AIRS.

[38]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[39]  Monika Henzinger,et al.  Analysis of a very large web search engine query log , 1999, SIGF.

[40]  Shenghua Bao,et al.  Research on Expert Search at Enterprise Track of TREC 2006 , 2005, TREC.

[41]  Peter Bailey,et al.  Overview of the TREC 2007 Enterprise Track | NIST , 2008 .

[42]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[43]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[44]  Yiqun Liu,et al.  A PDD-Based Searching Approach for Expert Finding in Intranet Information Management , 2006, AIRS.

[45]  Olga Vechtomova,et al.  In Enterprise Search: Methods to Identify Argumentative Discussions and to Find Topical Experts , 2006, TREC.

[46]  Thijs Westerveld,et al.  Correlating Topic Rankings and Person Rankings to Find Experts , 2006, TREC.