Learning to rank academic experts in the DBLP dataset

Expert finding is an information retrieval task that is concerned with the search for the most knowledgeable people with respect to a specific topic, and the search is based on documents that describe people's activities. The task involves taking a user query as input and returning a list of people who are sorted by their level of expertise with respect to the user query. Despite recent interest in the area, the current state-of-the-art techniques lack in principled approaches for optimally combining different sources of evidence. This article proposes two frameworks for combining multiple estimators of expertise. These estimators are derived from textual contents, from graph-structure of the citation patterns for the community of experts and from profile information about the experts. More specifically, this article explores the use of supervised learning to rank methods, as well as rank aggregation approaches, for combining all of the estimators of expertise. Several supervised learning algorithms, which are representative of the pointwise, pairwise and listwise approaches, were tested, and various state-of-the-art data fusion techniques were also explored for the rank aggregation framework. Experiments that were performed on a dataset of academic publications from the Computer Science domain attest the adequacy of the proposed approaches.

[1]  Stan Matwin,et al.  18th European Conference on Machine Learning , 2007 .

[2]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[3]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[4]  Chun-Ting Zhang,et al.  The e-Index, Complementing the h-Index for Excess Citations , 2009, PloS one.

[5]  Cynthia Rudin,et al.  On Equivalence Relationships Between Classification and Ranking Algorithms , 2011, J. Mach. Learn. Res..

[6]  Hang Li Learning to Rank , 2017, Encyclopedia of Machine Learning and Data Mining.

[7]  Bernhard Pfahringer Semi-random Model Tree Ensembles: An Effective and Scalable Regression Method , 2011, Australasian Conference on Artificial Intelligence.

[8]  W. Bruce Croft,et al.  Proximity-based document representation for named entity retrieval , 2007, CIKM '07.

[9]  Jiawei Han,et al.  Ranking-based classification of heterogeneous information networks , 2011, KDD.

[10]  M. de Rijke,et al.  Expertise Retrieval , 2012, Found. Trends Inf. Retr..

[11]  Yannis Manolopoulos,et al.  Generalized comparison of graph-based ranking algorithms for publications and authors , 2006, J. Syst. Softw..

[12]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[13]  Mônica G. Campiteli,et al.  Is it possible to compare researchers with different scientific interests? , 2006, Scientometrics.

[14]  P.-C.-F. Daunou,et al.  Mémoire sur les élections au scrutin , 1803 .

[15]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[16]  L. Egghe,et al.  Theory and practise of the g-index , 2006, Scientometrics.

[17]  Enrico Motta,et al.  The Open University at TREC 2006 Enterprise Track Expert Search Task , 2006, TREC.

[18]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[19]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[20]  Javed A. Aslam,et al.  Condorcet fusion for improved retrieval , 2002, CIKM '02.

[21]  S. Adali,et al.  A classification algorithm for finding the optimal rank aggregation method , 2007, 2007 22nd international symposium on computer and information sciences.

[22]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[23]  Tie-Yan Liu,et al.  Directly optimizing evaluation measures in learning to rank , 2008, SIGIR.

[24]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[25]  Luc De Raedt,et al.  Proceedings of the 22nd international conference on Machine learning , 2005 .

[26]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[27]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[28]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[29]  W. Bruce Croft,et al.  Hierarchical Language Models for Expert Finding in Enterprise Corpora , 2008, Int. J. Artif. Intell. Tools.

[30]  Pavel Serdyukov,et al.  Search for expertise : going beyond direct evidence , 2009 .

[31]  Hongbo Deng,et al.  Formal Models for Expert Finding on DBLP Bibliography Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[32]  Tao Qin,et al.  Learning to rank relational objects and its application to web search , 2008, WWW.

[33]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[34]  Yannis Manolopoulos,et al.  Generalized Hirsch h-index for disclosing latent facts in citation networks , 2007, Scientometrics.

[35]  Yannis Manolopoulos,et al.  Generalized h-index for Disclosing Latent Facts in Citation Networks , 2006, ArXiv.

[36]  Craig MacDonald,et al.  Voting techniques for expert search , 2008, Knowledge and Information Systems.

[37]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[38]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[39]  M. de Rijke,et al.  A language modeling framework for expert finding , 2009, Inf. Process. Manag..

[40]  W. Riker,et al.  Liberalism Against Populism: A Confrontation Between the Theory of Democracy and the Theory of Social Choice , 1982 .

[41]  Djoerd Hiemstra,et al.  Modeling Documents as Mixtures of Persons for Expert Finding , 2008, ECIR.

[42]  Yannis Manolopoulos,et al.  A citation-based system to assist prize awarding , 2005, SGMD.

[43]  Mônica G. Campiteli,et al.  An index to quantify an individual's scientific research valid across disciplines , 2005 .

[44]  Xiangji Huang,et al.  Modeling document features for expert finding , 2008, CIKM '08.

[45]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[46]  Craig MacDonald,et al.  Learning Models for Ranking Aggregates , 2011, ECIR.

[47]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[48]  Johan Bollen,et al.  Co-authorship networks in the digital library research community , 2005, Inf. Process. Manag..

[49]  Bo Wang,et al.  Expert2Bólè: From Expert Finding to Bólè Search , 2009 .

[50]  Pável Calado,et al.  Learning to Rank for Expert Search in Digital Libraries of Academic Publications , 2011, EPIA.

[51]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[52]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[53]  Sergei Maslov,et al.  Finding scientific gems with Google's PageRank algorithm , 2006, J. Informetrics.

[54]  Shenghua Bao,et al.  Research on Expert Search at Enterprise Track of TREC 2006 , 2005, TREC.

[55]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[56]  Neil A. Ernst,et al.  The Journal of Systems and Software , 2022 .

[57]  Azadeh Shakery,et al.  Leveraging comparable corpora for cross-lingual information retrieval in resource-lean language pairs , 2012, Information Retrieval.

[58]  ChengXiang Zhai,et al.  Probabilistic Models for Expert Finding , 2007, ECIR.

[59]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[60]  Rich Caruana,et al.  Additive Groves of Regression Trees , 2007, ECML.

[61]  Hongbo Deng,et al.  Enhanced Models for Expertise Retrieval Using Community-Aware Strategies , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[62]  David Hawking,et al.  Panoptic Expert: Searching for experts not just for documents , 2001 .