AEFS : Authoritative Expert Finding System Based on a Language Model and Social Network Analysis

Searching for experts on a given topic is a critical problem in many real-world situations, such as collaborative finding. Even so, previous work has only focused on searching for experts based on the appearance of topic query in an organization’s documents, which means that the experts selected might not be suitable for the task at hand. To resolve this problem, we propose an Authoritative Expert Finding System, called AEFS, which ranks the publications of experts to indicate their level of expertise. AEFS uses non-textual information, e.g. impact factor, to represent the quality of publications, and provides a citation matching function that removes duplicated citations based on the concept of centrality in social network analysis (SNA). In our experiments, we compare a number of related approaches to show that: (1) the proposed approach achieves a good performance in terms of the average F-measure; (2) citation matching can reduce the number of training examples required; and (3) non-textual features are very effective for searching for experts.

[1]  Thomas H. Davenport,et al.  Book review:Working knowledge: How organizations manage what they know. Thomas H. Davenport and Laurence Prusak. Harvard Business School Press, 1998. $29.95US. ISBN 0‐87584‐655‐6 , 1998 .

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  Maarten de Rijke,et al.  Finding experts and their eetails in e-mail corpora , 2006, WWW '06.

[4]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[5]  William W. Cohen,et al.  Learning to match and cluster large high-dimensional data sets for data integration , 2002, KDD.

[6]  Johan Bollen,et al.  Journal status , 2006, Scientometrics.

[7]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[8]  Amit P. Sheth,et al.  Semantic analytics on social networks: experiences in addressing the problem of conflict of interest detection , 2006, WWW '06.

[9]  Pradeep Ravikumar,et al.  Adaptive Name Matching in Information Integration , 2003, IEEE Intell. Syst..

[10]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[11]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[12]  W. Bruce Croft,et al.  Hierarchical Language Models for Expert Finding in Enterprise Corpora , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[13]  William W. Cohen,et al.  Learning to Match and Cluster Entity Names , 2001 .

[14]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[15]  Andrew McCallum,et al.  A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance , 2005, UAI.

[16]  Jayant Madhavan,et al.  Reference reconciliation in complex information spaces , 2005, SIGMOD '05.

[17]  Craig MacDonald,et al.  Voting for candidates: adapting data fusion techniques for an expert search task , 2006, CIKM '06.

[18]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[19]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[20]  C. Lee Giles,et al.  Autonomous citation matching , 1999, AGENTS '99.

[21]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[22]  Stuart J. Russell,et al.  Identity Uncertainty and Citation Matching , 2002, NIPS.

[23]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[24]  Paul P. Maglio,et al.  Expertise identification using email communications , 2003, CIKM '03.

[25]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[27]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[28]  Wei-Ying Ma,et al.  Ranking user's relevance to a topic through link analysis on web logs , 2002, WIDM '02.

[29]  T. Bayes An essay towards solving a problem in the doctrine of chances , 2003 .

[30]  Paul Thompson,et al.  An Inductive Search System: Theory, Design, and Implementation , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[31]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[32]  Etienne Barnard,et al.  Data characteristics that determine classifier performance , 2006 .