Finding Academic Experts on a MultiSensor Approach using Shannon's Entropy

Expert finding is an information retrieval task concerned with the search for the most knowledgeable people, in some topic, with basis on documents describing peoples activities. The task involves taking a user query as input and returning a list of people sorted by their level of expertise regarding the user query. This paper introduces a novel approach for combining multiple estimators of expertise based on a multisensor data fusion framework together with the Dempster–Shafer theory of evidence and Shannon’s entropy. More specifically, we defined three sensors which detect heterogeneous information derived from the textual contents, from the graph structure of the citation patterns for the community of experts, and from profile information about the academic experts. Given the evidences collected, each sensor may define different candidates as experts and consequently do not agree in a final ranking decision. To deal with these conflicts, we applied the Dempster–Shafer theory of evidence combined with Shannon’s Entropy formula to fuse this information and come up with a more accurate and reliable final ranking list. Experiments made over two datasets of academic publications from the Computer Science domain attest for the adequacy of the proposed approach over the traditional state of the art approaches. We also made experiments against representative supervised state of the art algorithms. Results revealed that the proposed method achieved a similar performance when compared to these supervised techniques, confirming the capabilities of the proposed framework.

[1]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[2]  Shenghua Bao,et al.  Research on Expert Search at Enterprise Track of TREC 2006 , 2005, TREC.

[3]  Mônica G. Campiteli,et al.  Is it possible to compare researchers with different scientific interests? , 2006, Scientometrics.

[4]  Peter J. F. Lucas,et al.  Principles of expert systems , 1991, International computer science series.

[5]  Pável Calado,et al.  Learning to Rank for Expert Search in Digital Libraries of Academic Publications , 2011, EPIA.

[6]  Yannis Manolopoulos,et al.  Generalized Hirsch h-index for disclosing latent facts in citation networks , 2007, Scientometrics.

[7]  Bo Wang,et al.  Expert2Bólè: From Expert Finding to Bólè Search , 2009 .

[8]  R. Rousseau,et al.  The R- and AR-indices: Complementing the h-index , 2007 .

[9]  Hongbo Deng,et al.  Formal Models for Expert Finding on DBLP Bibliography Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Hongbo Deng,et al.  Enhanced Models for Expertise Retrieval Using Community-Aware Strategies , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[11]  Craig MacDonald,et al.  Voting techniques for expert search , 2008, Knowledge and Information Systems.

[12]  Mounia Lalmas,et al.  A Dempster-Shafer indexing for the focused retrieval of a hierarchically structured document space: Implementation and experiments on a web museum collection , 2000, RIAO.

[13]  M. de Rijke,et al.  A language modeling framework for expert finding , 2009, Inf. Process. Manag..

[14]  Jie Yang,et al.  Sensor fusion using Dempster-Shafer theory [for context-aware HCI] , 2002, IMTC/2002. Proceedings of the 19th IEEE Instrumentation and Measurement Technology Conference (IEEE Cat. No.00CH37276).

[15]  David Hawking,et al.  Panoptic Expert: Searching for experts not just for documents , 2001 .

[16]  transl. Kazuko Takagi,et al.  The place of libraries in a digital age , 2002 .

[17]  W. Bruce Croft,et al.  Hierarchical Language Models for Expert Finding in Enterprise Corpora , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[18]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[19]  Mohamed Farah,et al.  An outranking approach for rank aggregation in information retrieval , 2007, SIGIR.

[20]  Craig MacDonald,et al.  Learning Models for Ranking Aggregates , 2011, ECIR.

[21]  Johan Bollen,et al.  Co-authorship networks in the digital library research community , 2005, Inf. Process. Manag..

[22]  Yannis Manolopoulos,et al.  Generalized h-index for Disclosing Latent Facts in Citation Networks , 2006, ArXiv.

[23]  W. Riker,et al.  Liberalism Against Populism: A Confrontation Between the Theory of Democracy and the Theory of Social Choice , 1982 .

[24]  Yannis Manolopoulos,et al.  Generalized comparison of graph-based ranking algorithms for publications and authors , 2006, J. Syst. Softw..

[25]  Hui Li,et al.  Structural damage identification based on integration of information fusion and shannon entropy , 2008 .

[26]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[27]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[28]  Mounia Lalmas,et al.  Representing and retrieving structured documents using the Dempster-Shafer theory of evidence: modelling and evaluation , 1998, J. Documentation.

[29]  Yezhuang Tian,et al.  Dempster-Shafer evidence theory of information fusion based on info-evolutionary value for e-business with continuous improvement , 2005, IEEE International Conference on e-Business Engineering (ICEBE'05).

[30]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[31]  Javed A. Aslam,et al.  Condorcet fusion for improved retrieval , 2002, CIKM '02.

[32]  Djoerd Hiemstra,et al.  Modeling Documents as Mixtures of Persons for Expert Finding , 2008, ECIR.

[33]  Yannis Manolopoulos,et al.  A citation-based system to assist prize awarding , 2005, SGMD.

[34]  Robert A. Hummel,et al.  On the Use of the Dempster Shafer Model in Information Indexing and Retrieval Applications , 1993, Int. J. Man Mach. Stud..

[35]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[36]  W. Bruce Croft,et al.  Proximity-based document representation for named entity retrieval , 2007, CIKM '07.

[37]  Pramod K. Varshney,et al.  Multisensor Data Fusion , 1997, IEA/AIE.

[38]  L. Egghe,et al.  Theory and practise of the g-index , 2006, Scientometrics.

[39]  P.-C.-F. Daunou,et al.  Mémoire sur les élections au scrutin , 1803 .

[40]  Chun-Ting Zhang,et al.  The e-Index, Complementing the h-Index for Excess Citations , 2009, PloS one.

[41]  Luo Si,et al.  Discriminative models of integrating document evidence and document-candidate associations for expert search , 2010, SIGIR '10.

[42]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.